Journal of 


Experimental Psychology 


Artuur W. Me ton, Editor 
Department of Psychology, University of Michigan 

Ann Arbor, Michigan 

Davip A. Grant, Associate Editor Dextos D. Wickens, Associate Editor 
University of Wisconsin Ohio State University 

Consulting Editors 
E. James Ancuen, University of Wisconsin Luovp G. Humrneevs, University of Illinois 
Jupson S. Brown, State University of lowe Agruun L. Inton, Tulane University 


Crerus J. Burne, Indiana University Howagp H. Kenvier, New York University 
Pau M. Fitts, Ohio State University Dowarp B. Linpsiey 


Frepeaice C. Faicx, ‘ : 
M , Insti of Technol University of California, Los Angeles 


Franx A. Gevoaap, University of Virginia Kennetnu MacCorguonare, University of Minnesota 
Jamas J. Ginson, Cornell University Quinn McNeman, Stanford University 

Cranence H. Granam, Columbia University Near E. Murer, Yale University 

Harzotp W. Hake, University of Illinois Kennetn W. Spence, State University of lowa 


Artnur C. Horrman, Managing Editor 
Hexen Ore, Assistant Managing Editor 
Editorial Staff: Sap J. Dovie, Frances H. Crarx, Barsara Cummincs, Saran Womack 





CONTENTS 


Ratt Cxate ant Conguey Shales Gero Dusen Focal Cention: 
S. Sravens anp E. H. Gacantzr 377 


Intelligibility as a Function of Frequency of Usage: M. R. Rosznzweic ann L. Postman 412 
Selective Sampling in Discriminatiwn Learning: D. L. LaBzacez anp A. SmitH ....... «++ 423 
The Semantic Differential and Mediated Generalization as Measures of Meaning: 
L. Lirron amp R. L. Branton 431 
Cutaneous Discrimination of Radiant Heat: W. H. Tzicunzr ......... eeeeeees 
The Role of Prefeeding in an Frustration Effect : 
J. P. Sewanp, A. C. Pznzszoom, B. Burize, amp R. B. Jowzs 445 
Resistance to Extinction as a Function of the Discrimination Habit Established During 
Fixed-Ratio Reinforcement: M. R. Denny, R. H. Watis, anv J. L. Maatscu ...... 451 
Judgments of Visus! Velocity as a Function of Length of Observation Time: 
A. G. Gotpsraim 457 
Effect of Amount and Timp Teterved Beles Tot on Dataiee 
; in Rats: J. P. Sa wm pdivccddbccdeveaceestees 462 
Gane of Human 
— Pupllay Dilation: B. Sampson, amp G. L. Bostov 467 





American Psychological Association 
Vol. 54 No. 6 December 1957 





JOURNAL OF EXPERIMENTAL PSYCHOLOGY 


The JournaL oF pees, PsycHo.ocy is published monthly, 
two volumes the American Psychological Association, Inc., 
at Prince and Sts., , Pa. Sot ae leds oden 
is $3.00, or $16.00 annually. ally. "Single copies are $1.50. orders 
address changes, and business communications should be addressed to the 
acton D.C Association, Inc., 1333 Sixteenth St. N.W., Wash. 
ington 


This Journat publishes inal 
intended to contribute poy ohn 
studies involving abnormal or animal su 
specifically oriented toward the extension 


psychometric studies and studies in experimental psychology or 
if they have broad implications 


wr ‘Normally gg: mt “ay printed 

ithin this limit, piecemeal, experiment 

psychological research is discouraged, and the 

conclusions of substantial segments of research Spe- 
cifically, an integrated series of studies accom accomplished rhe mies ae (e. £- , 
most Joctoral dissertations) must be presen in a single article. 
provision is made for the publication of Supplementary and oi 
cation Reports in articles of not more than 1.5 printed pages (see Editor's 
Note, this JourRNAL, 1957, 53, 1-2). 

Address all articles submitted for forme to the editor, Arthur W. 
Melton, Department of Psy niversity of Michigan, Ann Arbor, 
Michigan. All saeco sm ‘be submitted in two complete por gy 
only t mtg Pirie lc ma | Lap cal weep ea for 
publication. — may be inal line —- or Back sag A tographic 

ould not exceed 84 by 11 in. n to insure 


nts, but sh 
Si of labels and critical Bs be of eS when jehoeed to a single- 
umn or (exceptionally) a double-column figure in the JourNat. 


practices of the JOURNAL, authors should examine a recent issue. 





Rosered on eocendcinw matter, Seirony 6, 5067, af the pan fine ot Lenser, Pu.. ender 

March 3, 79, Acceptance fot at ‘the special rate postage provided lor in paragraph 4-2), 

foniee $4.90, ?. Le R. of 1948, September 11, 1 tee 

Send address 1333 Sixteenth Wi 6, D. C. Address must reach 

the Subscription Quer by the vol the month ts tobe lect a eas = 
second-class forwarding postage. Hog yt 1, iF fast 


© 1957 by the American Psychological Association, Inc. 





Journal of 


Experimental Psychology 








VoL. 54, No. 6 


DECEMBER, 1957 








RATIO SCALES AND CATEGORY SCALES 
PERCEPTUAL CONTINUA! 


DOZEN 


FOR A 


S. S. STEVENS AND E, H. GALANTER 


Psycho-Acoustic Laboratory, Harvard University 


We will examine the relation be- 
tween two classes of scaling pro- 
cedures, those that require the 
observer (O) to divide a segment of 
a continuum into a finite number of 
categories and those that allow him 
to make direct estimations of apparent 
magnitudes or of apparent ratios. 
The “category methods” include pro- 
cedures sometimes called by such 
names as rating scale, single stimuli, 
equal sense-distances, equal intervals, 
etc. They are procedures that try 
to get O to partition a segment of a 
continuum into equal intervals. The 
“ratio methods” include such pro- 
cedures as fractionation, multiplica- 
tion, constant-sum, magnitude es- 
timation, etc. These are procedures 
that try to get directly at the form of 
the ratio scale of subjective magnitude. 

The relation between category 
scales and ratio scales is nonlinear 
on one class of perceptual continua, 
but on another class the relation 
may be linear. The general rule is 
that category methods give non- 


This research was carried out under Con- 
tract Nonr-1866(15) between Harvard Uni- 
versity and the Office of Naval Research, U. S. 
Navy (Project Nr142-201, Report PNR-186). 
Reproduction for any purpose of the U, S. 
Government is permitted. 


377 


linear scales whenever O's sensitivity 
to differences (measured in subjective 
units) is not constant over the scale. 
Only when sensitivity is uniform 
over the continuum can we expect 
linear results from category ratings. 
Thus we find that judgmental con- 
tinua divide themselves into two 
fundamental classes. On a Class I 
continuum the category scale is con- 
cave downward when plotted against 
the ratio scale of subjective magni- 
tude. On a Class II continuum the 
category scale may be linear when 
so plotted. 

Class I we label prothetic, because 
it includes, among other things, 
magnitudes like heaviness, loudness, 
brightness, etc., for which discrimina- 
tion appears to be based on an 
additive mechanism by which excita- 
tion is added to excitation at the 
physiological level. Class II we label 
metathetic, because it includes pitch, 
position, etc., for which discrimina-' 
tion behaves as though based on a 
substitutive mechanism at the physio- 
logical level (51, 52).? 

* Following certain precedents among sur- 
geons, the term prothetic is used in lieu of 
prosthetic, which derives from the Greek word 


meaning “to add.” Metathetic derives from a 
word meaning “to change or substitute.” 





378 


The distinction between Class I 
(prothetic) and Class II (metathetic) 
is something like the traditional 
distinction between sensory intensity 
and sensory quality, but it is not 
quite the same. Other domains than 
intensity in the narrower sense behave 
prothetically, <.g., apparent length, 
numerousness, and duration. The 
distinction is in some ways like the 
distinction of quantity vs. kind—or 
size vs. sort. What is needed, how- 
ever, is a functional definition of 
these classes—one that leaves us 
free to classify continua simply on 
the basis of how they behave (see 
61). 


ConTINUA oF Crass I 
(PROTHETIC) 


General Procedure 


The general procedure in the experiments on 
category scaling is exemplified in the following 
section on length of lines. Unless otherwise in- 
dicated, it may be assumed that each of ten Os 
made two judgments of each stimulus (occa- 
sionally only one judgment per stimulus). At 
the outset of the session O was shown the two 
extreme stimuli. This was to acquaint O with 
the range, a procedure that saves time and seems 
to reduce O’s tendency to avoid the end cate- 
gories of the scale—the “central tendency” or 
“regression effect” (25). 

The stimuli were presented in a different 
irregular order toeach O. ‘This “randomization” 
of order is designed to cancel out the second- 
order effect that leads O to change categories 
whenever he notes a change in the stimulus, even 
when the change is too small to justify a jump to 
another category. These effects of a preceding 
judgment on a subsequent judgment have been 
analyzed by Garner (12). Analogous effects 
under the method of constant stimuli are dis- 
cussed by Preston (43). Presumably one can 
balance out these intraserial eflects by presenting 
the stimuli in a different irregular order to each 
O. ‘The category scale values are plotted as the 
arithmetic means of the category assignments. 

In the experiments designed to produce a 
ratio scale of a psychological continuum we used 
the method of magnitude estimation. The 
stimuli were presented in a different irregular 
order to each O, whose task was to assign num- 
bers proportional to the apparent magnitude. 
The O could use any numbers he thought ap- 





S. 8. STEVENS AND E. H. GALANTER 


propriate. The stimulus range was not dis- 
closed at the outset. 

Since magnitude estimates usually give a 
skewed distribution, medians were usually com- 
puted. When the distributions were not skewed, 
means were sometimes used. 

For a detailed discussion of some of the 
problems involved in the method of magnitude 
estimation see Stevens (59). 


Length of Lines 


In order to make the issues clear, let 
us consider one of the simpler types of 
judgment: perceived length of lines. If 
O cannot equalize the categories on his 
rating scale when his task is to judge the 
length of lines, it seems plausible to ex- 
pect that he will do even less well when 
judging other perceptual continua. 


Ten Os judged the lengths of 17 steel rods in 
terms of an 11-point scale, using the numbers 1 
to ll. The rods ranged from 4 to 111 cm. 
They were placed one at a time in “random” 
order on the table in front of O who saw each 
rod for about | sec. and then made his judgment. 
At the beginning of each test, O was shown a rod 
3 cm. long and told that it was slightly shorter 
than any in the series. He was also shown a 
rod 112 cm. long and told that it was slightly 
longer than any in the series. 

Another ten Os followed a different procedure. 
They were merely shown the rods in random 
order and instructed to estimate the length of the 
rods in inches. In both experiments the order of 
presentation was different for each O. 


In Fig. 1 the means for each type of 
judgment are plotted against physical 


length. The obvious fact is that the 
category judgments follow a curve that is 
concave downward, whereas the mag- 
nitude estimations follow a curve that is 
slightly concave upward. It follows 
that, if we were to plot the category 
judgments against the scale obtained by 
magnitude estimation, the curve would 
be even more concave than it is when 
plotted against centimeters. 

The slightly curved line drawn through 
the magnitude estimations is a segment 
of the subjective scale of apparent length 
derived from fractionation judgments by 
Reese and others (44). They named the 
unit of this scale a mak (Greek for 
length). Thus we have two independent 
experiments confirming the rather obvi- 





RATIO SCALES AND CATEGORY SCALES 


ous fact that apparent length is very 
nearly a linear function of physical 
length. If we take the subjective scale 
in maks to be a valid scale of apparent 
length, it follows that the rating scale 
consisting of finite categories from 1 to 11 
isnonlinear. This means that successive 
intervals along the rating scale are. not 
equal in subjective magnitude. 

The category widths on thé rating 
scale are small at the lower end, which 
makes the slope steep, and large at the 
upper end, which makes the slope flat. 
Figure 1 suggests that the width of the 
middle categories remains fairly con- 
stant. This is apparently a fact peculiar 
to this particular experiment on length, 
for in most other judgments the width of 
the categories increases steadily as we go 
up the scale. Length is ‘a simple and 
easy thing to judge and O does not allow 
the middle category on his rating scale 
to stray very far from the middle stim- 
ulus‘in the series. Nevertheless it does 


stray, and the problem is, why? 


The why of it—What happens when O rates 
an apparent magnitude on an N-point scale is the 
result of the interplay of three important 
“forces” plus an unknown number of lesser ones. 
The three important factors that interact are: 
intent, discrimination, and expectation. By 
these terms we mean the following: (a) Jntent: 
a properly instructed O tries to make the inter- 
vals on his category scale equal in width and 
thereby produce a linear scale. He aims at a 
true interval scale. (b) Discrimination: O's 
ability to tell one magnitude from another varies 
over the scale and affects the width of the 
categories. Where discrimination is good the 
categories tend to be narrow; where discrimina- 
tion is poor the categories become wider. 
(c) Expectation: O inevitably has expectations 
regarding the relative frequency with which he 
should use the various categories. In the typical 
experiment O’s normal expectation is that he 
will be required to use the category numbers 
approximately equally often. At the end of a 
session, some Os express concern when they have 
failed to use all the categories at least once. 
This fact, as we will show, makes it possible to 
alter the form of the category scale by changing 
the distribution of the stimuli. (It should be 
noted that direct magnitude estimations are not 
easily affected by the spacing of the stimuli (59), 
which is an important point in their favor.) 

How these three factors interact depends upon 
circumstances. For purposes of analysis we can 





Tr ~— — 


LENGTH OF LINES 





4. — 


-" 
“0 60 * 
LENGTH IN CENTIMETERS 








Fic. 1. 


Magnitude and category scales 
for apparent length. 


assume that O's intent to produce an interval 
scale remains constant and that discrimination 
and expectation operate to help or hinder O's 
purpose. 

First let us see how discrimination might 
affect the outcome. We have already noted 
that the width of O's categories tends to vary 
inversely with his ability to discriminate among 
the stimuli. If this were the only factor operat- 
ing it would be possible to predict the form of the 
category scale in a precise manner. The argu- 
ment would proceed from the fact that on 
prothetic continua the increment required for a 
detectable difference is roughly proportional to 
the magnitude being judged. 

If we were to assume that the category width 
is proportional to the discriminal uncertainty, 
and that the uncertainty is proportional to the 
magnitude being judged, it follows that the 
category scale would be logarithmic. In other 
words, in the extreme case in which relative 
discriminability is the controlling factor, the 
curve relating category judgment to magnitude 
scale should approximate a logarithmic form. 

The upper curve in Fig. 1 is less curved than a 
logarithmic function, which indicates that 
discrimination is not the sole controlling factor 
in an easy judgment like length. We will see, 
however, that in other more difficult judgments, 
the logarithmic form is sometimes approached 
more closely, especially when the stimuli are 
bunched close together at the low end of the 
scale. But as a general rule the curvature of the 
cateyorv scale is less than logarithmic. 

Cutting across all this is the factor of ex- 
pectation—O’s tendency to try to use the 
categories equally often. Depending on the 
spacing of the stimuli, this factor may operate 
to increase or to decrease the curvature of the 





380 


category scale. If the stimuli are uniformly 
spaced along the psychological continuum, and 
if O expects to use the categories equally often, 
the result is a tendency for his expectation to 
straighten out the scale. Obviously, if we give 
N equally spaced stimuli to be rated on an N- 
point scale and O uses all the categories with no 
inversions, the result will be perfectly linear. 
Despite this tendency of expectation to 
straighten the scale, in no case that we have 
examined have category judgments of a Class I 
magnitude produced a linear scale when the 
stimuli were uniformly spaced along the psycho- 
logical continuum and were close enough to- 
gether to be occasionally confused. 

When the stimuli are not distributed uni- 
formly along the psychological continuum, the 
factor of expectation affects the curvature of the 
category scale by changing the local slope of the 
function. Where the stimuli are bunched close 
together the slope gets steeper. Where the 
stimuli are spread out the slope gets flatter. 
Bunching in this sense can mean either a close 
spacing of the stimuli or a more frequent 
presentation of particular stimuli. 

It appears that the slope gets steeper in the 
region of bunching because O tends to resist 
naming the same category over and over. 
This fact, also shown by Arons and Irwin (2), 
is clear when you watch the typical O perform 
in one of these experiments. He tries to spread 
his judgments, and when the stimuli are bunched 
he spreads his judgments by making his cate- 
gories narrower. 

It should be noted that Volkmann, Hunt, and 
McGourty (72) were able to demonstrate that 
increasing the density of the stimuli along a 
a scale does not alter the width of the judgment 
categories, provided the spacing remains con- 
stant. It requires a nonuniform distribution of 
stimulus density to produce the changes in 
curvature we are attributing to expectation. 

In summary, then, jt appears that the first- 
order phenomena in category judgments depend 
on three factors whos: combined effects deter- 
mine the form of the‘curve relating category 
judgment to psychological magnitude. ‘The O's 
intent to equalize the categories works in the 
direction of a linear function, but linearity is 
not achieved because discrimination is not uni- 
form over the scale. Where discrimination is 
good the categories are narrow and the curve is 
therefore steep. Where discrimination is poor 
the categories become wider and the slope of the 
curve declines. ‘The effects of O's expectations 
are ordinarily less potent than the effects of 
discrimination, but expectation interacts with 
the spacing of the stimuli to alter the slope of the 
curve in various ways. 


S. S. STEVENS AND E. H. GALANTER 


Judgments of Duration 


In judgments of the duration of bursts 
of white noise we again find that the 
category scale is nonlinear relative to the 
ratio scale of psychological magnitude. 
The results of several experiments are 
shown in Fig. 2. 

The curve labeled ‘“‘5-point scale” 
represents data obtained by Postman and 
Miller (41) who sounded a white noise 
and asked nine Os to judge the durations 
on a scale from 1 to 5. Despite the fact 
that O knew there were five stimuli and 
five categories, the category scale is 
clearly nonlinear against physical time. 

In order to test this matter further, we 
extended the range of durations and ob- 
tained the two curves labeled “‘7-point 
scale” in Fig. 2A. The bottom curve 
represents magnitude estimations of 
duration. The Os were presented with a 
duration of 1 sec. and told that it was 1 
sec. They were then asked to judge the 
apparent duration of the various stimuli, 
presented in random order. The mean 
estimates are shown in both Fig. 2A and 
2B. 

These magnitude estimations confirm 
the general form of the subjective scale of 
duration derived from fractionation judg- 
ments by Gregg (14) who proposed the 
name femp for the unit of apparent 
duration. Gregg’s scale is slightly con- 
cave upward, as is also the scale obtained 
by Ross and Katchmar (47) for longer 
durations. These authors proposed 
chron as the name of the unit of apparent 
duration. Thus we have three independ- 
ent studies suggesting that the psycho- 
logical magnitude of apparent duration 
is a nearly linear, but slightly accelerat- 
ing, function of physical duration. 

The curves for the category scales in 
Fig. 2A provide convincing evidence 
that, when the stimuli are approximately 
equally spaced, the category scale is non- 
linear in the manner expected. All three 
curves are concave downward. 

We may now ask what happens when 
the stimuli are not equally spaced. As 
shown in Fig. 2B the logarithmic spacing 
(filled circles), which bunches the stimuli 
at the low end of the scale, steepens the 





RATIO SCALES AND CATEGORY SCALES 


22 ee 


DURATION 


- 


ny 


CATEGORY AND MAGNITUDE SCALES 





<4. s + a 
is 20 25 
SECONDS 
Fie, 2. 


Magnitude and category scales for apparent duration: A. 


> 
akan UO 
EST imation 


B 


ew" 
20 30 40 











ya a ey ae 
' 


SECONDS 


linear coordinates; 


duration plotted on a logarithmic scale. 


slope over the lower segment of the 
curve precisely as we should expect. 
(Compare the filled circles with the un- 
filled circles.) And there is a corre- 
sponding tendency for the logarithmic 
spacing to flatten the slope over the 
upper part of the range. With the 
logarithmic spacing the resulting cate- 
gory scale is almost logarithmic, as 
shown by the fact that the filled circles 
in Fig. 2B lie almost on a straight line. 
In linear coordinates the curve for the 
logarithmic spacing would be more 
concave downward than any of those 
shown in Fig. 2A. 

When the stimuli are bunched at the 
upper end of the scale, the factor of 
expectation increases the slope near the 
upper end and decreases it near the lower 
end. Thus we see by the crosses in 
Fig. 2A that the crowding of half the 
stimuli into the region from 3 to 4 sec., 
inclusive, makes the upper part of the 
curve extremely steep; and the category 
scale takes on almost the same form as 
the magnitude scale (squares). Ap- 
parently, by playing on O’s expectations 
in the proper way we can sometimes get 
him to distribute his judgments in such 
a way that his category scale becomes 
nearly linear. 

It is significant, however, that Os 
tended to be impressed with the strange- 
ness of this distribution of stimuli. 


Some of them complained afterwards 
that most of the stimuli were long ones 
and they wondered why. This reaction 


was not observed with the other spacings. 


A “pure” category scale.—Suppose we ask 
what would be the form of the scale if it were 
not subject to distortions imposed by O's ex- 
pectations. Since it is presumably not possible 
to remove from O's mind all expectations and 
hypotheses regarding how he should distribute 
his judgments, an alternative strategy would be 
for E to order the procedure in a way that meets 
the typical O's expectations with minimal dis 
crepancy. This suggests the possibility of an 
iterative series of experiments 
follows. 

We will assume that O expects the series of 
stimuli to be so arranged that all categories 
appear equally often. Since at the outset we 
know nothing about the form of O's category 
scale, we present to the first group of Os a series 
of stimuli spaced in some arbitrary manner along 
the continuum. 


designed as 


The resuits-wf this test give us 
a first approximation to the category scale 
For the second group of Os we space the stimuli 
so that they reflect equal intervals on the scale 
obtained from the first group of Os. 
a new curve 
oe ” 

pure” scale. 


This gives 
a second approximation to the 
Using this better approximation 
we respace the stimuli and repeat the procedure 
with a third group of Os. We repeat this 
process until stability is reached, i.e., until the 
results of a test are such that no further change 
in spacing is called for. 

With enough homogeneous groups of Os, this 
iterative procedure could in principle reveal the 





S. 8. STEVENS AND E. H. GALANTER 


ee 





VISUAL NUMEROUSNESS 
a 
4 


ae 


$6 6 
SUBJECTIVE NUMEROUSNESS IN NUMERS 
CATEGORY SCALE 
7 


& 


8 











= #6 % 4 


NUMBER OF DOTS 


7 


Fic. 3. Magnitude and category scales: A, 


unadulterated form of the category scale—the 
form in which the effects of expectation have 
been neutralized, and discrimination is the only 
first-order factor left to interact with O's intent. 

Our experiments on duration make it clear 
that for this type of continuum the iterative 
procedure would probably result in a category 
scale that is more curved than the scale obtained 
with linear spacing of the stimuli (see Fig. 2A) 
and perhaps a little less curved than the scale 
obtained with logarithmic spacing of the stimuli. 
The evidence for this conclusion is found in the 
curves in Fig. 2, where we can also see how the 
iterative procedure might have worked if we had 
arbitrarily begun our series of experiments by 
giving the first group of Os the stimulus series 
with the crowding at the upper end. ‘The 
category scale obtained under this spacing is not 
far from linear. Hence, as the second step in the 
iterative process we would space the stimuli 
approximately equidistant from one another and 
run the next test. As we have already seen, the 
result for equal spacing is a curve concave down- 
ward, This outcome would call for . respacing 
of the stimuli in the next experiment, to place 
them closer together at the low end and farther 
apart atthe upperend. ‘This new spacing would 
presumably result in still more curvature in the 
resulting data. ‘The outcome of a further step 
in this iteration is described elsewhere (61). 

The stimulus spacing that gives the pure 
category scale is such that each category of the 
scale will tend to be used equally often by O. 
In terms of information theory, this is the 
spacing that maximizes the amount of informa- 
tion in O's responses. 

The iterative procedure suggested here is 


T _—y > a 


AREA OF RECTANGLES 


© RECTANGULAR OYSTRIBUTION 

6 LARGE STL) MOST FREQUENT 

O Sea. STI! MOST FREQUENT 
(PAROUCE!) 


B; 
@ SPOUT wUMEMICm SCALE 
—t wis 0 
BREA IN SQUARE INCHES 





numerousness; B.—apparent size of rectangles. 


analogous to the procedure recommended in 
other connections (57, 64) for determining the 
form of a function in the face of biasing factors 
in the experiment. The iterative approach is 
potentially a powerful strategy for neutralizing 
certain kinds of distorting forces. We suggest, 
in fact, that it is the proper answer to the “un- 
solved” difficulty mentioned by Guilford (15) 
in his discussion of Sanford’s classical experiment 
on the category scaling of lifted weights. Titch- 
ener (69) and Thurstone (68) also refer to the 
problem of stimulus spacing and the errors it 
produces because of O's expectations. “The 
error,” Titchener says, “cannot be eliminated.” 
We suggest that it can be. But its elimination 
will give the pure form of the category scale, 
which is not in general an equal-interval scale 
of the psychological magnitude. 


Judgments of Numerousness 


The term numerousness (51) refers to 
the subjective impression one gets from 
viewing a collection of objects without 
resort to counting. An attempt to con- 
struct a ratio scale of numerousness was 
made by Taves (67) who projected dots 
haphazardly on a screen and asked five 
Os to select another pattern on which the 
dots appeared to be half as numerous 
(method of fractionation). A segment 
of Taves’ scale in units called numers is 
shown in Fig. 3A, together with the 
category scale determined by Guilford 
(16), who used cards on which were 





RATIO SCALES AND CATEGORY SCALES 


placed various numbers of dots, ranging 
from 15 to 74 in a roughly geometric 
series. The O’s task was to sort the 
cards into nine piles, attempting to keep 
the intervals between successive piles 
psychologically equal. This is equiva- 
lent to judging the numerousness of the 
dots in terms of a 9-point scale. The 
resulting curve is almost logarithmic, 
which confirms our finding that a loga- 
rithmic spacing of the stimuli tends to 
produce a logarithmic form in the 
category scale (cf. Fig. 2B). 

Guilford’s experiment on one O gives 
the same form of curve as Thurstone’s 
earlier experiment (68) and on 101 Os. 
In this more extensive experiment the 
category scale also turned out close to 
logarithmic, which Thurstone took to be 
a verification of Fechner’s law. Thur- 
stone discussed some of the potential 
biases inherent in this method of equal- 
appearing intervals, including the matter 
of stimulus spacing, but he seems not to 
have questioned the basic validity of this 
procedure for psychological scaling. 

It is interesting in this connection that 
Urban (71) perceived the essential truth 
of the matter when he said “the method 
of equal appearing intervals can not 
furnish a real proof of Fechner’s loga- 
rithmic law” (71, p. 232). He went on 
to venture that the 10 points on the scale 
of numerousness achieved by Thurstone’s 
experiment are separated not by a con- 
stant distance but by a constant ratio. 
This is an interesting conjecture, for 
whenever a category scale turns out to 
be a logarithmic function of the scale of 
psychological magnitude, as it sometimes 
does (with certain stimulus spacings), 
Urban’s formulation is precisely true. 

Actually, whenever the scale of psy- 
chological magnitude approximates a 
power function of the stimulus, as it 
usually does on Class I continua, a 
category scale that is a logarithmic 
function of the stimulus magnitude is 
also a logarithmic function of the psycho- 
logical magnitude. Thus if the category 
scale, C, is related to the stimulus scale, 
S, by 


C=alog § 


383 


and if the scale of psychological mag- 
nitude y is given by 


vy = dS" 
then 


C= = (log Y — log 4) 


Hence equal steps on C may some- 
times represent equal ratios on y. 


Visual Area 


In the judgment of perceived area the 
magnitude scale is nearly linear against 
physical area, but the category scale is 
concave downward. 

Experiments by Reese and others (44) 
have shown that an area that looks half 
as large as a given area contains roughly 
half the number of square centimeters. 
They proposed the name var (visual area) 
for the unit of perceived area.* 


For comparison with the var scale we 
have data on category judgments that 
were kindly supplied us by Parducci. 
In a series of group experiments (38, 39) 
he presented a set of nine square cards 
ranging from 2 to 9 in. on a side and 
asked Os to rate them in five categories 
from “very large” to ‘“‘very small.” 
Some of Parducci’s results are shown in 
Fig. 3B. 

The solid circles represent the average 
category judgments obtained when each 
of the nine areas was presented ap 
proximately equally often. Since the 
stimuli were bunched closer at the low 
than at the high end of the scale, the 
curve turns out to be rather steep at the 
low end and flat at the high end. 

With other groups of Os Parducci 
varied the relative frequency of presenta 
tion of the stimuli, with interesting re- 
sults. For some groups the smallest five 
stimuli were presented eight times as 
often as the largest four. For other 
groups the largest five stimuli were 
presented eight times as often as the 
smallest four. The averages of the 
judgments on the last presentation of the 
nine stimuli are plotted in Fig. 3B. 


4In another context the electrical engineers 
use var as the name for the unit of reactive power. 





8. 8. STEVENS AND E, H. GALANTER 





T T 


LIFTED WEIGHTS 


T 7 


LIFTED WEIGHTS 


ict 








20 





LIFTED WEIGHTS 


CATEGORY SCALE 


43 to 292 grams 
it powt scole 


4 10g spacing 
0 tneor spacing 
@ veg spacing 
(Canter @ Mirsch) 








a as 


LIFTED WEIGHTS 
° 


17 to 645 grams 


4,0 12 -powt scale 


o 17,33) ond 645 groms 
3-pomt scale 











RELATIVE WEIGHT 


Fic. 4. Category scales for lifted weights plotted against normalized coordinates. 


When the small stimuli predominate, 
the curve (squares) becomes even steeper 
over the low range and flatter over the 
high range. Just the opposite occurs 
when the large stimuli are presented 
most often (triangles). This is a clear 
demonstration of the fact that the 
relative frequency of presentation, like 
relative spacing, can alter the local slope 
of the category scale in a predictable 
manner. 

We note that two of the circles 
representing the average judgments made 
when all stimuli were presented about 


equally often fail below the smooth curve 
we have drawn through the other points. 
This irregularity puzzles us, because with 
as many as 30 Os the curve should be 
smoother. In order to check on this 
matter we made up a set of nine cards of 
the size Parducci used and determined 
the somewhat smoother curve plotted as 
circled crosses in Fig. 3B. 


Lifted Weights 


Psychologists have been lifting weights 
for more than a hundred years and the 





RATIO SCALES AND CATEGORY SCALES 


accumulation of evidence regarding judg- 
ments of heaviness is large. Figure 4 
shows data obtained in six representative 
independent studies involving category 
scales (5, 7, 8, 23, 32). These experi- 
ments sample a wide range of conditions 
as regards number of judgment cate- 
gories and the range and spacing of the 
stimuli. Yet the functions are all con- 
cave downward. What significant dif- 
ferences there are between the various 
curves are mostly due to stimulus spac- 
ing. Note especially the effects of the 
three different spacings used by Canter 
and Hirsch (7). 

In order to exhibit these data on 
similar plots the coordinates have been 
normalized by a linear transformation to 
make them correspond to an 11-point 
category scale and a 100-point weight 
scale. In other words, the scales used by 
the different authors were stretched or 
shrunk until they all covered the same 
extent on the graph paper. 

With so much data already available 
on rating-scale judgments of lifted 


weights, it seems almost superfluous to 


add another study. However, in one 
experiment we spaced weights logarith- 
mically over a wide range (17 to 645 gm.) 
and had seven Os judge each weight 
three times on a 12-point scale. The 
resulting function (Fig. 4D, circles) is 
highly curved. It is in fact almost 
logarithmic. We next changed the spac- 
ing to distribute the weights more uni- 
formly over the range, and from seven 
additional Os we obtained the curve 
shown by the triangles in Fig. 4D. The 
change in curvature is as obvious as the 
reason for it. 

We then did an experiment that in 
some respects might seem trivial. 
tested the obvious point that, if the 
spacing between the stimuli is so wide 
that O never makes a “mistake,” the 
form of the category scale is indetermi- 
nate. When there is perfect discrimina- 
tion (perfect transmission of information) 
the form of the resulting curve depends 
only on the placement of the stimuli, 
which can be varied within whatever 
range will lead to no confusions with 
neighboring stimuli. We used the 


We. 


385 


weights 17, 331, and 645 gm. and asked 
six Os to judge them, four times each, on 
a scale of light, medium, and heavy. 
No one made a mistake, and since the 
weights happened to be evenly spaced 
the resulting curve is linear, as shown in 
Fig. 4D. Incidentally, none of the Os 
was aware that only three weights were 
used. They all thought they were lifting 
more than three different weights. 

The conditions under which one can 
achieve this limiting case of perfect 
discrimination or identification has been 
the object of much study in its own right 
by those interested in the information 
capacity of different sense modalities 
(30). It is found, for example, that ona 
single sensory continuum perfect trans- 
mission is not possible with more than 
about five different stimuli. Most people 
make errors if asked to identify more 
stimuli than this, regardless of how they 
are spaced. 

At the other extreme, we might ask 
what happens when the stimuli are 
spaced very close together within a small 
range. Does the category scale keep the 
same form when the stimulus range is so 
short that O can barely discriminate the 
top from the bottom stimulus? For the 
answer to this question we go to the 
classical paper of Wever and Zener (74) 
in which they used the so-called method 
of absolute judgment to determine the 
difference limen for lifted weights. The 
judgments were made on a three-point 
scale, sometimes 1, 2, and 3, and some- 
times light, medium, and heavy. From 
the published graphs of the psychometric 
functions we were able to read off the 
scale values assigned by each O to each 
weight and to average the scale values. 
The results are shown in Fig. 5A, along 
with the results obtained by Fernberger 
(11) in a similar experiment. Plotting 
the data in this form shows two things: 
(a) On the average, Os assign to the 
middle weight in the series a category 
value greater than 2 (the middle cate- 
gory). The so-called “time error’ is 
probably nothing more or less than a 
manifestation of this phenomenon (61). 
(4) The curves are all concave downward. 
Thus the two features observed when the 





S. 8. STEVENS AND E. 


a 


LIFTED WEIGHTS 





© 2 OBSERVERS 
Oo € OBSERVERS 
(WEVER AND JENER) 


4 6 OBSERVERS 
(0 E RMBE AGE A) 





06 


A 


H. GALANTER 








T T T 


LIFTED WEIGHTS 


50 te 100 GRAMS 
CATEGORIES 4 to 12 wiTH 
OTHER Sites -_ 
‘ (GULF ORO AND DINGMAN) B 4 
SS a ee a 
50 60 70 60 90 100 


5 














10 
9 
6 
7 
6 
5 
4 
5 
2 


WEIGHT IN GRAMS 


Fic. 5. 
extendable at both ends. 


range is larger are clearly evident when 
the range covers no more than two or 
three just noticeable differences. 

Figure 5B shows the results of an 
interesting change in method introduced 
by Guilford and Dingman (18). They 
sought to get rid of the usual “end 
effects” observed with category scales by 
using a 15-point scale, but with the end 
stimuli (50 and 100 gm.) defined as 
categories 4 and 12. These end stimuli 
were available to be lifted at O’s pleasure. 

Despite the change in method, the 
curve in Fig. 5B is concave downward, 
like all the others. Guilford and Ding- 
man argue, as have many before them, 
that the nearly logarithmic form of the 
category scale for lifted weights supports 
Fechner’s law. Our contention would be 
that the curve in Fig. 5B has no bearing 
on Fechner’s law—it merely shows that 
under these conditions discrimination is 
a major factor controlling the form of the 
category scale. We would also suggest 
that this same factor of discrimination 
explains the fact that Os used the cate- 
gories above 12 much more often than 
they used the categories below 4. Guil- 
ford and Dingman say, “The reason 
for this asymmetry in the use of extra 
categories is not obvious” (18, p. 451). 
Is it not simply that discrimination is 


Category scales for lifted weights: A. 
periments designed to measure the just noticeable difference; B. 


obtained with closely spaced weights in ex- 
obtained with category scale 


poorer at the high than at the low end 
of the range? 

Ratio scale for heaviness.—In order 
to compare the form of the category scale 
for lifted weight with that of the mag- 
nitude scale, we need, of course, to know 
how subjective weight increases with 
physical weight. For a variety of rea- 
sons, this poses a thorny problem in 
scaling. The judgment of heaviness is 
a complex, difficult judgment that is 
easily influenced by a host of second- 
order. factors, such as manner of lifting, 
order of lifting, size-weight illusion, 
method of report, etc. The teasing out 
of all these parameters would be an 
interesting life’s work if the scale of 
subjective weight were an important 
substantive problem in its own right. 
But studies in this area become essays in 
methodology (in the proper meaning of 
this much abused term) and the sub- 
stantive outcome is usually of little 
interest. 

Nevertheless, at least eight independ- 
ent studies have undertaken to establish 
a ratio scale of subjective weight. They 
all concur in showing that, when ap- 
parent weight is plotted against physical 
weight, the curve is concave upward. 
This upward concavity of the magnitued 
scale contrasts with the downward 
concavity of the category scale. 





RATIO SCALES AND CATEGORY SCALES 


The agreement among the several 
magnitude scales for subjective weight 
extends even further. With one possible 
exception, they all indicate that to a 
first-order approximation subjective 
weight is a power function of physical 
weight. This we regard as an important 
matter, because the power function ap- 
pears to be the first-order approximation 
for a large class of perceptual continua— 
perhaps for all Class I (prothetic) con- 
tinua (61). The exponent of the power 
function ranges from about .3 for loud- 
ness to about 2.0 for visual flash rate. 

A nice thing about the power function 
is that when plotted in log-log coordinates 
it becomes a straight line, and the slope 
determines the value of the exponent. 
In Fig. 6 we have used log-log coordinates 
to exhibit the results of several inde- 


pendent efforts to determine a scale of 


subjective weight. ‘T).c linearity of most 
of the functions is readily apparent. 
The methods usei to obta: the data 


in Fig. 6 have been (a) fracisonation by 
Harper and Stevens (20),. by Taback 
(the earliest study, but unpublished— 
partially reported by Rogers [45]), by 


Warren and Warren (73), and by Guil- 


100 ;-— 


50 
BAKER AND OUDERK 


O Rate estemotion 
© Constont sum 








387 


ford and Dingman (17); (4) constant sum 
by Guilford and Dingman (17), and by 
Baker and Dudek (3); (c) ratio estimation 
by Baker and Dudek (3); and (d) mag- 
nitude estimation by members of the 
Harvard Laboratory. 

Under this last method (59) O is 
usually told that some particular weight 
is by definition a given value (e.g., 100) 
and that his task is to assign to the other 
weights numbers proportional to the ap- 
parent magnitude. The third curve 
from the right in Fig. 6 was obtained in 
an experiment begun by Stevens and 
carried through by J. Nachmias and 
DD. Pertschonok. The weights ranged 
from 19to193gm. The standard weight 
was always available for comparison and 
the other weights were presented twice 
each in random order. The points in 
Fig. 6 are plotted at positions propor- 
tional to the median judgments of groups 
of ten Os. 

For the next curve the weights, rang- 
ing from 17 to 212 gm., were judged six 
times by each of seven Os. The stand- 
ard was not presented before every 
presentation of a variable, but only 
before every third presentation. The 


0 9 oF covet 00 
© 93 gm colied 100 
o 19 om oles! 


om cated OO 


4 


Ne devqrated sterdord 
ous owe 
o Wr wre 











+ ane 


RELATIVE WEIGHT 


Fic. 6. 


Magnitude scales for lifted weights. 
are plotted in log-log coordinates. 


The results of various experiments 
Abscissa is relative only 





388 


experimental design was such that each 
weight was judged with respect to the 
standard at positions 1, 2, and 3 removed 
from the standard. The effects of posi- 
tion were generally minor as far as the 
slope of the function is concerned. 

The last curve on the right was ob- 
tained with no designated standard. 
The instructions were merely to assign 
numbers proportional to the apparent 
weight. The shorter series involved 
weights from 17 to 196 gm., and 16 Os 
judged each weight twice. The longer 
series involved weights from 17 to 645 
gm. and 11 Os judged each weight twice. 
Since each O stated his judgments in 
terms of a different modulus, the median 
estimate was computed only after all 
estimates by a given O had been multi- 
plied by whatever constant was required 
to transform his estimate of the 98-gm. 
weight to a value of 100. 

This method in which the standard is 
not repeatably available to O is in essence 
the same method we used to get direct 
magnitude estimations for Jength and 
duration. With lifted weights the 
method seems to give a lower slope than 


most of the other methods, which was ap- 
parently not true when the “no standard” 
procedure was used with length and 


duration. But it is clear from Fig. 6 
that in the present state of the art no one 
determination of slope should be taken 
too seriously. 

The fractionation data from Warren 
and Warren (73) are from the part of 
their experiment in which the weights 
were all of the same volume. They also 
showed that when the sizes of the con- 
tainers are changed the size-weight 
illusion affects the outcome of fractiona- 
tion judgments. 

The procedure used by Taback has not 
been published, but we have it from 
Taback’s teacher (J. Volkmann) and 
from one of his Os (IF. C. Frick) that the 
variable weight was continuously adjust- 
able by O, by means of a system that 
added or removed oil from a cup held by 
O but not visible to him. The method 
of adjustment was used to determine the 
weight that felt ‘half as heavy as the 
standard. Baker and Dudek (3) allude 


S. 8. STEVENS AND E. H. GALANTER 


to an unpublished experiment by Joy in 
which continuous adjustment was used 
to obtain fractionation values by the 
method of limits. Provision for an ad- 
justable stimulus is presumably an ad- 
vantage in experiments of this sort, but 
it is probably better to use the method of 
adjustment by O than to use the method 
of limits. 

The function obtained by Harper and 
Stevens (20) is steeper than the others 
and slightly curved. Despite pride of 
ownership, at least one of the authors is 
prepared to admit that this function is 
probably too steep to be representative. 
As Guilford and Dingman (17) suggest, 
it may be that the variable weights 
presented to O, from which he was to 
choose the apparent half-value, were 
poorly selected. But the same can be 
said of any selection of variable weights 
unless an iterative process, analogous to 
that described above, is used to determine 
the range and distribution of the variables.‘ 

Although most Fs have not done so, 
it is usually advisable to check the re- 
sults of fractionation by means of multi- 
plication. It has been well demonstrated 
in judgments of loudness (58) that, in 
order to cancel certain types of bias in 
the method of fractionation, we need to 
balance the experimental design by re- 
quiring O to double the apparent mag- 
nitude as well as to Aalve it. 

This brings us to the data obtained by 
the “constant-sum”’ method in which O 
divides 100 points between two weights 
in such a way that the division reflects 
the apparent ratio between the stimuli. 
Guilford and Dingman (17) extended 
this procedure to require O to partition 
100 points among five weights. In the 
hands of Guilford and Dingman the 
method gave results that agree with their 


*The Harper-Stevens scale agrees quite well 
with some unpublished results reported to the 
authors by J. FE. Karlin. Karlin used telephone 
handsets of varying weight which O picked up 
with the ordinary motion used in lifting a hand- 
set. Nine Os were used, and values for both 
halving and doubling were obtained. Over a 
range of about 20 to 2000 gm. the scale obtained 
was a power function whose exponent was of the 
order of 2.0. The results for halving agreed well 
with those for doubling. 





RATIO SCALES AND CATEGORY SCALES 


own fractionation data, but Baker and 
Dudek (3) obtained rather discordant 
data. 

Baker and Dudek also asked Os to 
estimate the apparent ratio between each 
pair of weights, judging the heavier in 
terms of the lighter. This gave a matrix 
of data that presents interesting prob- 
lems of averaging—with no very obvious 
solution. The solution resorted to by 
the authors gives the curved function 
shown in Fig. 6. Why this curvature? 
Our guess is that the curvature stems 
from the requirement that O make 
judgments in terms of a_ constantly 
shifting standard. The inconsistencies 
in the data of Baker and Dudek are 
consistent with the hypothesis that, 
when O is forced to judge ratios against a 
changing standard, the “effective” stand- 
ard is not the standard presented, but is 
a value that is shifted slightly toward the 
center of the series. Thus O names too 
small a ratio for pairs of light weights 
and too large a ratio for pairs of heavy 
weights. This effect is analogous to 
that described by Woodrow (75) who 
used a variable standard weight under 
the method of constant stimuli. The 
same process seems to be at work in an 
experiment on loudness described below 
(see Fig. 10B, unfilled triangles). 

In order to try to reduce the direct 
ratio judgments of Baker and Dudek to 
a function with a constant slope (in log- 
log coordinates) we used the data to 
obtain a separate slope constant (the 
exponent of the power function) for 
each pair of stimuli judged. All 36 
slopes turned out to be greater than 1.0 
(range: 1.08 to 1.88). We then averaged 
these slopes and obtained the line labeled 
“mean slope” in Fig. 6. This at least 
gives us a function comparable in form 
to the other curves. 

What then is the form of the scale of 
subjective weight—the veg scale? It 
seems clear that it is a power function 
and that the exponent is greater than 1.0 
Until we can better determine which is 
the best experiment, perhaps a de- 
fensible procedure is to adjust our choice 
to the consensus of the experimental 
results. The slopes in Fig. 6 range from 


VEG SCALE 


GRAMS 
Fic. 7. Veg scale of subjective weight 
derived from Fig. 6. 


about 1.13 to 2.07. The mean is 1.47 
and the median is 1.43. As an ap 
proximate working rule, therefore, we 
suggest the equation 


V = 0.001264" * 


where V is subjective weight in vegs (1 
veg corresponds to 100 gm.) and W is 
weight in grams. In linear coordinates 
this veg scale has the form shown in 
Fig. 7. 


The question of validity.—An equation such as 
the one proposed above may be expected to hold 
under some set of “standard conditions,” e.g., 
lifting weights of standard, uniform size under a 
standard method of lifting. It will not neces 
sarily hold for the lifting of weights that differ in 
size, or weights presented in different ways. As 
in all scientific endeavor we have to start with 
some set of “standard conditions,” determine the 
empirical rules, and then explore the problem of 
the invariance of the rules under transformations 
of the conditions. Contrary to what some 
authors seem to imply, the failure of invariance 
to hold does not invalidate the rules or the 
equations that hold for the standard conditions. 
Our aspiration, of course, is to formulate rules 
of wide invariance, for that is the chief aim of 
the scientific enterprise. The demonstration 
that the outcome of an experiment depends on 
“conditions” is a way of showing that invariance 
is limited, but this fact has no necessary bearing 
on the problem of validity 


The validity of a subjective scale, or of any 
other scale, is always a matter of opinion. 
Valid is what makes sense to the scientific com- 
munity in terms of the problems before it, and, 
unfortunately, when we push the problem back 
to where we have to make fundamental choices, 





390 


there are no external criteria to guide the ulti- 
mate value judgments that have to be made. 
Reliability is a tempting criterion, but sometimes 
we find that agreement among experimental 
results is due to the operation of factors that 
force agreement, as when all Os give identical 
ratings to the three weights shown by the 
squares in Fig.4D. Predictive power is another 
tempting criterion, but it occasionally happens 
that prediction succeeds for wrong reasons, as 
when Fechner’s law predicts the outcome of 
some types of category judgments. What we 
consider to be valid measures of things is subject 
to constant revision because we are always up 
against the uncertain task of deciding, without 
firm external criteria, that the given measures 
do or do not assess the things we are interested in. 


Loudness 


It was the problem of loudness judg- 
ments that first excited our interest in 
the nonlinearity of rating scales. The 
senior author had taken it for granted 
that if a group of judges were to rate the 
loudness of noises on an N-point scale the 
ratings would be linear against the sone 
scale of subjective loudness. He was 
therefore puzzled when Callaway (6) re- 
ported a study in which a jury of listeners 


judged, on a 6-point scale, the loudness of 


various trucks on a highway. The jury 
ratings were not linearly related to sones, 
but were roughly linearly related to log 
sones. A similar finding was made by 
Beranek (4) who used a rating scale to 
assess the opinions of office workers 
regarding the noisiness. of various offices. 
The sone values were computed from 
octave-band analyses of the noises by a 
procedure developed by Stevens (60). 

These results seemed at first to indicate 
a serious deficiency in the sone scale. In 
the light of our findings in other sense 
departments, it is now clear that any 
other outcome of these rating tests 
would have been anomalous. Rating 
scales of loudness give an assessment 
that cannot be expected to agree with the 
sone scale, 

The sone scale itself has probably been 
studied by more laboratories in more 
different countries than any other ratio 
scale of subjective magnitude, and the 
extensive data have been reviewed else- 
where (58). Suffice it to say that, for a 


S. 8. STEVENS AND E. H. GALANTER 


1000-cycle tone, the subjective loudness 
in sones L is related to intensity (sound 
energy) J by a power function of the 
form L = kl®*. We will make use of 
this function to exhibit the nonlinearity 
of category scales of loudness. We use 
the sone scale rather than an arbitrary 
physical scale (sound pressure, sound 
energy, or decibels) because the basic 
invariance we are interested in holds 
only for the relation between category 
scales and ratio scales of psychological 
magnitude. This is indeed a case in 
which a relation becomes invariant pro- 
vided the proper measure is used. 

The data in Fig. 8A (circles) were 
kindly supplied by W. R. Garner who 
had 20 Os rate the loudness of a tone on 
a scale from zero to 20. The stimuli 
were spaced logarithmically from 5 to 100 
db (5-db steps). Only every other step 
is plotted in Fig. 8A. Other interesting 
features of this experiment have been 
discussed elsewhere (59), but the im- 
portant point to be made here is that 
when the ratings are plotted against 
sones the curve is concave downward. 
This is despite the explicit instructions 
given O to use the middle category, 10, 
for the loudness that seemed half as 
great as the loudest tone. The O was 
also instructed to use all the categories. 

With the assistance of H. Rubin, we 
carried out a series of experiments in- 
volving category judgments of the loud- 
ness of white noise. The usual pro- 
cedure was to inform O of the range of 
stimuli to be presented by sounding a 
stimulus a little below the lowest in the 
series and a little above the highest in the 
series. (We later found that presenting 
the lowest and the highest stimuli is just 
as satisfactory.) Groups of 20 Os judged 
the loudness of the noises in terms of 
category scales involving 3, 7, 20, and 100 
steps, and in terms of a scale defined by 
seven adjectives. Each stimulus was 
judged twice by each O, and the points 
in Fig. 8 show the average rating given 
each stimulus. For the purpose of 
plotting the data, the sone values for the 
noise were taken to be the same as the 
sone values for a 1000-cycle tone of the 
same sound pressure level. This is not 





RATIO SCALES AND CATEGORY SCALES 





t) 


yy 


© WOO0-cycle tone ~ 20 levels 


CATEGORY SCALE 


2! > pom! scole 


(Gorner) 


ys 


White nose ~ 15 levels 
20 ~ pom! scole 


SE 
w” 4 so 


ee 


4 
0 
—y 


pone 


LOUDNESS 


3 


$s 


i 
: 
3 


nr 
y 


ewe wee eS owe 


MEDIAN CATEGORY ASSIGNED 

















White none 


7 point scales 


& 


ry 


White noise 


) 


Untomited pomt scote 








LOUDNESS IN SONES 


Fic. 8. 


strictly true, but the error in this assump- 
tion affects the shape of the resulting 
curves only to a degree that is negligible 
for our present purpose. In a group 
experiment on the magnitude estimation 
of the loudness of white noise, involving 
70 Os, J. C. Stevens and Tulving (49) 
showed, for example, that the subjective 
scale for white noise is essentially a 
power function of intensity. 

Figure 8B shows that it makes no 
difference to the form of the function 
whether O uses a 7-point numerical scale 
or a scale defined by seven adjectives. 
The fact that the curve for the adjectival 
scale falls a little below that for the 
numerical scale reflects the fact that 
some of these Os felt that the loudest 


Category scales for loudness. 


noise used was not what they would 
normally call “very, very loud.’” Thus 
the semantic system O brings with him 
to the experiment can sometimes affect 
the semantic system we try to establish 
when we exhibit the defining stimuli for 
an adjectival scale. 

Figure 8C shows the results for the 
3-point scale and the 100-point scale. 
When the number of categories is as few 
as three the curve becomes somewhat 
irregular, almost in a stepwise manner. 
Otherwise it shows the same general 
form as the curve based on 100 cate- 
gories. Of course, when O is told to use 
a 100-point scale he does not use all the 
points. He apparently tries to sub- 
divide the 100-point scale into subsec- 





392 


tions of manageable size, and he tends 
to report his judgments in “round 
numbers.” 

Rubin also carried this process to the 
limit and asked 20 Os to use as many 
categories as they needed—an “‘infinite”’ 
number if necessary. In order to plot 
the median judgments shown in Fig. 8D 
the various numbers used by each O were 
multiplied by an appropriate factor such 
that the values assigned to the 70-db 
stimulus (8 sones) were made to average 
10. When O is not limited to a finite set 
of predetermined categories, the scale 
becomes a closer approximation of the 
sone scale. There is still some curvature 
in kig. 8D, however, possibly because 
some Os may have attempted to partition 
the continuum, as they did in using a 
finite category scale. 

Of course, the particular forms of the 
curves observed in Fig. 8 are functions 
of the spacing of the stimuli, which in ail 
the examples discussed so far was 
logarithmic (5-db steps). Consequently, 
we ran some additional experiments like 
those of Rubin, but in which the spacing 
between the stimuli was varied. In one 
experiment the spacing was approxi- 
mately equal in sones rather than in 
decibels. The results are shown in 
Hig. 9A, 

We then examined the frequency dis- 
tributions of the ‘category assignments 


Sone spacing 





O “Adjusted” spacing 








io 26 a 1) 
SONES 


Fic. 9. Category scales for loudness: A. 


effects of stimulus spacing; B. 


S. 8. STEVENS AND E. H. GALANTER 


under decibel spacing and under sone 
spacing. Both distributions are highly 
skewed: low numbers are used much 
moe frequently with decibel spacing, 
high numbers with sone spacing. From 
these distributions we tried to estimate 
the spacing that would lead to a sym- 
metrical distribution and to produce 
thereby what we have called the “pure” 
form of the category scale. This is a 
kind of short-cut iterative procedure. 
We chose the stimulus values 40, 50, 58, 
65, 72, 78, 83, 88, 92, 96, and 100 db. 
This adjusted spacing gave the curve 
shown by the squares in Fig. 9A. It 
differs from the curve produced by the 
sone spacing in the way we would pre- 
dict. The adjusted spacing came close 
to producing the pure category scale in 
the sense that the distribution of cate- 
gory use was reasonably symmetrical. 
The categories from 1 to 7 were used with 
the following frequencies: 26, 32, 23, 16, 
27, 28, 24. In this symmetrical distribu- 
tion the middle category was used least 
often. We find that this apparent 


avoidance of the middle category is not 


unusual, although we do not know how 
universal it may be. Perhaps it is 
related to Preston’s (42) demonstration 
that O tends to avoid the repeated 
naming of the middle category in a three- 
category experiment with lifted weights. 

At this point one O, Geraldine Stone, 


r ' 


LOUONESS 


Q adjusts stimulus 
(4- end stimuli) 


O assigns cotegories 





$0 DECIBELS @&© 


experiment in which 


one group of Os adjusted loudness to designated categories (filled triangles), and another group judged 


stimuli (circles) spaced according to results obtained from the first group. 


The unfilled triangles 


show the levels used to define Categories 1 and 7 for the adjustment experiment 





RATIO SCALES AND CATEGORY SCALES 





d Musical “dynomcs 
(Reger) 


ey 
tee 
4 
J, 


7-powmt scole 


T-powmt scole 
“wrong” spocing 











COMPARATIVE RATING 


& Fined stondord 
& Vorable stondord 


(Michels & Doser) 


a a ae) 











SOUND PRESSURE LEVEL IN DECIBELS 


Fic. 10. A. 


Scale of musical “dynamics.” 


rating scales of loudness. 


suggested that if we want to know how 
the stimuli should be distributed perhaps 
we should ask the listeners! We should 
acquaint O with the bottom stimulus 
(Category :) and with the top stimulus 
(Category 7) and then ask each O to ad- 
just the level of the noise to produce the 
various categories, in random order. 
Let us call this method category produc- 
tion. The procedure is possible with 
auditory stimuli because it is easy to 
arrange a potentiometer (64) so that O 
can vary the level over a wide range. 
The bottom stimulus was 55 db, the 
top 95 db—shown by unfilled triangles in 
Fig. 9B. The median results (15 Os) 
plotted against a decibel scale give a 
curve that is slightly concave upward. 
(This upward concavity against decibels 
is typical of a// the category scales we 
have obtained, but of course the amount 
of curvature depends on the stimulus 
spacing.) The ends of the curve in Fig. 
9B are displaced relative to the original 
end stimuli (55 and 95 db), showing a 
tendency for O to shorten the range. 
This type of contraction is presumably 
analogous to the “central tendency” 
observed in other types of “production”’ 
procedures (24). Plotted against sones, 
the curve becomes concave downward, 
as expected. Since the range here is 


Os made category judgments (triangles) of stimuli 
spaced according to Reger’s version of the musical scale of loudness 
pened when the stimuli were inadvertently bunched at the low end of the scale. 


The circles show what hap 
B.—Comparative 


Various intensities were judged relative to a stimulus at 88 db. 


shorter than that portrayed by the 
squares in Fig. 9A, the curvature appears 
superficially to be less, but it is actually 
quite similar to that shown by the 
squares in Fig. 9A. 

The next step was to use the curve 
determined by category production in 
order to select a stimulus spacing for 
another experiment on categorizing. For 
13 stimuli so spaced, 8 Os made judg- 
ments onascalefrom1to7. The circles 
in Fig. 9B show the average category 
assigned to the various intensities. 

It is clear that the curvature produced 
by the method of category production 
can be recovered when another group of 
Os assigns categories to stimuli spaced in 
accordance with the settings made by 
the first group. In this procedure we 
seem to have a rather direct method of 
approximating the pure form of the 
category scale. 

It should be pointed out that the ad- 
justment of the stimuli to pre-assigned 
categories (category production) is not 
unlike the process of bisecting (or equi- 
secting) a sense distance by the method 
of adjustment. Since adjustment to 
categories gives a function that is non- 
linear against sones, we would predict 
that bisection would give a similar result 
—which it does (13, 37, 61, 63). It ap- 





394 


pears that the systematic “error” in 
bisection is all of a piece with the non- 
linearity of rating scales. 

Another interesting bit of evidence 
regarding the form of the category scale 
for loudness lies in the nature of the 
rating scale used by the musician. The 
7-point “dynamic” scale from ppp to fff 
is a widely used scale, and it is an in- 
teresting question whether this scale is 
linear against sones, against decibels, or 
against neither. From what we have 
already learned, we would predict that 
the musician’s scale would certainly be 
nonlinear against sones. As a matter of 
fact, if it were linear against sones the 
difference between f/f and f/f would have 
to be no more than a couple of decibels— 
a difference that would be scarcely de- 
tectable to the listener in a concert hall. 
On the other hand, the foregoing experi- 
ments suggest that the musician’s scale 
of “dynamics” may be expected to show 
a slight acceleration as a function of 
decibels. This, in fact, is what is sug- 
gested by the levels tentatively proposed 
by Reger as typical of the musical loud- 
ness scale. The form of Reger’s scale, 


as reported by Seashore (48), is shown 
in Fig. 10A, 

In order to see how listeners in the 
laboratory would judge stimuli spaced 
according to Reger’s curve, we set up 
seven levels of white noise spaced accord- 


ing to the steps proposed by Reger. The 
Os made three judgments of each level on 
a 7-point numerical scale. 

We actually ran two experiments: the 
first one was a “mistake,” but an inter- 
esting one. In translating Reger’s spac- 
ing into decibels of attenuation on our 
apparatus we inadvertently put the wide 
spacings at the top end of the scale 
instead of at the bottom end. By the 
time we had run six Os it was clear that 
something was wrong. The categories 1, 
2, and 3 were being used far too often. 
We then corrected the spacing to con- 
form to Reger’s curve, with the results 
shown by the triangles in Fig. 10A. 

With Reger’s spacing the experiment 
gave a curve that is concave upward 
throughout its length, rather than linear 
over its upper portion, but it appears 


S. 8. STEVENS AND E. H. GALANTER 


that Reger’s estimate is probably not far 
off as regards the relative spacing of the 
steps. 

It is interesting to note that the 
“wrong” spacing used in our first attempt 
produced the expected changes in slope. 
The relative bunching of the stimuli at 
the low end makes the slope slightly 
steeper over this region, and the greater 
spread between stimuli at the high end 
makes the curve a little flatter there. 

Let us now consider another device by 
which we can change the local slope of 
the category function. As we have said, 
the basic generalization concerning cate- 
gory scales is that the category width is 
determined by discrimination (plus the 
perturbations, if any, produced by the 
relative spacing or frequency of the 
stimuli). If we can alter discrimination 
locally we should affect the slope of the 
category scale. 

One possibility is to improve O’s ability 
to discriminate among certain of the 
stimuli by setting up a landmark of some 
sort. This is what sometimes happens 
when a rating scale is used for com- 
parative judgments of a stimulus relative 
to a standard. Figure 10B shows an 
example. Michels and Doser (28) asked 
four Os to judge the loudness of. various 
stimuli relative to that of a 1000-cycle 
tone at 88 db. The comparison levels 
were to be rated relative to the standard 
on a 7-point scale, “very much louder” 
; “equal”... “very much softer.’ 
On the ordinate in Fig. 10B zero cor- 
responds to the category “equal.”” The 
points represent averages of all the data 
presented by the authors. 

We see that the category scale is very 
steep in the vicinity of the standard. 
The stimulus 5-db removed from the 
standard is usually called “louder” or 
“softer,” seldom “equal.’”’ The land- 
mark provided by the standard stimulus 
apparently improves discrimination in 
its immediate vicinity, which steepens 
the slope locally with a consequent 
flattening elsewhere. 

The unfilled triangles in Fig. 10B 
represent the data obtained when a fixed 
tone of 88 db was judged relative to a 
variable “standard.” In order to inter- 





RATIO SCALES AND CATEGORY SCALES 


pret this curve the numbers on the 
ordinate should have their signs reversed. 
It is plotted in this manner in order to 
show its relation to the curve obtained 
with a fixed standard. The combination 
of a roving standard and fixed com- 
parison tone is less effective in producing 
steepness in the vicinity of the fixed tone, 
but it is clearly not impotent in this 
respect. As we have already noted 
above, when O is required to judge 
against a standard that varies, he never 
quite does so. His “effective’’ standard 
is shifted toward the center of the series. 
The generally flatter slope of the function 
represented by the unfilled triangles is 
evidence of this shifting of the “effective” 
standard. 

Another example of the effects of a 
landmark will be discussed below when 
we deal with scales of position. 


Brightness 


It has already been shown that in 
many respects brightness behaves much 
like loudness (54). Although an ex- 
tensive investigation of the subjective 
scale of brightness is still in progress in 
this laboratory, enough has been learned 
to show that, for patches of white light 
viewed in a dark room, subjective bright- 
ness is a power function of luminance. 
Moreover, the exponent is of the order 
of one-third which is in reasonable agree- 
ment with results obtained by Hanes 
(19).° 

In order to make it easier to express 
the intensity of the visual stimulus, and 
to compare the facts in vision with those 
in hearing, it has been proposed (56) 
that luminance be stated in decibels re 
10-" lambert. This puts the reference 
level of the decibel scale for luminance a 
little below the absolute threshold of the 
eye, just as the reference level widely 
used in hearing (10°' microwatt per 


5 It is of historical interest that Delboeuf (10), 
in his account of his conversations with Plateau, 
says that Plateau suggested an exponent of one- 
third for the power function relating brightness 
to intensity. But, as we shall see, the similarity 
of these exponents is a mere coincidence, for 
Plateau was really talking about a bisection 
experiment. 


395 


cm.*) is a little below the threshold of 
the ear. These two decibel scales reveal 
some interesting parallels. Thus the 
level for comfortable listening and the 
level for comfortable viewing both turn 
out to be around 70-80 db, and the 
thresholds of discomfort in the eye and 
the ear are both of the order of 110-120 
db. On the decibel scale a luminance 
equal to 1.0 ml. is at 70 db. 

By analogy with the definition of the 
sone (50), it has been suggested that the 
bril, the unit of subjective brightness, be 
set at 40 db. Thus the bril is the bright 
ness seen by the typical viewer (dark 
adapted) when confronted with a surface 
whose luminance is 40 db, i.e., 1.0 micro- 
lambert. In what follows we will talk 
about decibels, simply because it is an 
easier notation than log millilamberts, 
log footlamberts, or any others of the 
manifold units of luminance. 


We will first compare the scale of subjective 
brightness with the results obtained with a 7- 
point rating scale. Both were obtained with the 
same apparatus, which consisted of a projection 
lantern illuminating the back side of a piece of 
milk glass framed by a black cardboard mask. 
The O sat in the dark and saw a spot of light 
subtending a visual angle of about 5°. ‘The spot 
was “on” for about | sec., and about 10 sec. 
elapsed between judgments. The luminance 
was controlled by neutral filters inserted in the 
beam of the lantern 

A segment of the bril scale was obtained by 
the method of magnitude estimation, as follows 
The O was first dark adapted for about 3 min. 
and then shown a luminance of approximately 
73 db, which he was told would be arbitrarily 
called 10. He was shown this level two or three 
times and told ‘to try to remember it. He was 
further instructed that he would see a series of 
brightnesses presented in “random” order and 
that his task was to assign numbers proportional 
He was told to 
use any numbers he found necessary—fractions, 
whole numbers, or decimals—but to try to keep 
the numbers proportional to the brightnesses. 
Each of ten Os then viewed 10 levels of lumi- 
nance covering a range of 4 log units, or 40 db, 
with two presentations of each stimulus and a 
different stimulus order for each O. 


to the brightness as he saw it 


The median estimates are plotted in 


Fig. 11A. In this particular case, the 
slope (the exponent of the power func- 
tion) is .36. Other evidence indicates 





§. S. STEVENS AND E. H. GALANTER 





Sys a a ee 


BRIGHTNESS OF LUMINOUS SPOT 


LOTT hn nn 





7-Pomt scotes 
Effect of stimulus spocing 


. 


‘ i} 


o5 EP ——A__4 a 44 4 [eee i a Se ee ee es 
40 


50 60 80 r) 5 10 ‘5 20 
LUMINANCE IN DECIBELS RE 10° LAMBERT SUBJECTIVE BRIGHTNESS (RELATIVE BRILS) 


® 
100 mae | ——r 
1 
4 
“ 











ee A — r Tr T am 


} BRIGHTNESS OF “STAR” BRIGHTNESS OF “STAR” 


f 
: 
§ 
j 


Cc P . ’ P “ g k 
0 io 2 30 40 80 66 

TMNGES*' SUBJECTIVE BRIGHTNESS (RELATIVE BRILS) 
omc.ts-» BO 90 100 0 
LUMINANCE IN DECIBELS RE 10" LAMBERT 

















i 4 i 4 a 
70 60 90 100 0 120 

LUMINANCE IN DECIBELS RE 10° LAMBERT 

Fic. 11. Magnitude and category scales for brightness: A.—direct magnitude estimations for 
luminous surface plotted in log-log coordinates; B.—category scales for luminous surface plotted 
against scale of brils determined by the line in A; C.—-direct magnitude estimations of the brightness 
of the “star” (point source) plotted in log-log coordinates; D.—-category scale (triangles) for “star” 
plotted against the scale of subjective brightness determined by the line in C, and category scale 
(circles) for “star” plotted against luminance in decibels. 








that this slope is a little steeper than the 
slope usually obtained with a more fully 
dark-adapted eye, but our concern here 
is not with the exact value of the ex- 
ponent. Rather it is with the relation 


in Fig. 11A, we use the magnitude 
function as the abscissa of the plot 
shown in Fig. 11B. The triangles in 
Fig. 11B represent the category judg- 
ments, and we see that as a function of 


of this power function to the curve ob- 
tained under the identical conditions 
when the Os were told to judge the series 
of brightnesses on a 7-point scale. 

In order to compare the mean category 
assignments with the magnitude function 


subjective brightness the category scale 
is concave downward. 

As is found with loudness, when the 
stimuli are spaced approximately equi- 
distant in decibels the category scale is 
quite steep over the low end of the range. 





RATIO SCALES AND CATEGORY SCALES 


The question now is this: Will a spacing 
of the light intensities that is more 
nearly uniform in terms of brils alter the 
category scale as did the equal sone 
spacing? The available filters did not 
allow us to set up a spacing that was 
precisely equal in brils, but we were able 
to approximate it by dropping out two of 
the stimuli near the low end (49 and 59 
db). The result, shown by the squares 
in Fig. 11B, is as expected: the curve is 
less steep at the low end and more steep 
at the high end. The over-all effect is 
to reduce the curvature, much as the 
sone spacing reduced the curvature of the 
scale shown in Fig. 9A. 

Stellar magnitudes.—The oldest cate- 
gory scale of which we have record is the 
scale from 1 to 6 in terms of which the 
early astronomers judged the brightness 
of the stars. This scale is upside down 
in the sense that lower numbers corre- 
spond to greater brightnesses. Modern 


astronomers use the same scale, with 
other categories added, but they now 
measure “‘stellar magnitudes” by photo- 
photometry 


electric rather than by 
visual estimate. The modern rule de- 
fines the step between successive stellar 
magnitudes as 4 db (31). Thus the 
modern scale of stellar magnitudes is a 
logarithmic scale. 

From our point of view the imteresting 
question concerns the form taken by the 
visual scale of stellar magnitude before 
the days of photometric measurements. 
That the older category scale was ap- 
proximately logarithmic seems quite 
clear, for the modern 4-db step was 
chosen to harmonize with the older scale. 
Fechner took this logarithmic relation 
to be an important corroboration of his 
law (22). 


To compare with the scale of stellar magni- 
tude we have the results of some experiments on 
the brightness of small targets. A series of ex- 
periments was undertaken by J. C. Stevens who 
used a spot .75 mm. in diameter viewed from a 
distance of | m. This spot, subtending an angle 
of about 1.5 min. of arc, looked much like a star 
in the sky. It was made by a pinhole placed 
before a luminous milk-Plexiglas surface and the 
luminance of the small spot is taken to be the 
same as the luminance of the Plexiglas surface. 
The luminance level was controlled by neutral 


‘to be answerable. 


397 


filters inserted in the light beam ahead of the 
Plexiglas. Casual tests with three Os showed 
that the threshold luminance of the small spot 
was about 15 db below the faintest stimulus 
used in the experiments. ‘This stimulus had the 
appearance of a faint star, but did not look as 
faint as some of the faintest stars in the sky. 
Before beginning each test O was dark adapted 
for about 10 min. 


A ratio scale of subjective magnitude, 
determined by the method of magnitude 
estimation is shown in Fig. 11C (median 
estimates for 15 Os). The straight line 
in this log-log plot has a slope of .47, 
which indicates that, as a function of 
intensity, the subjective brightness of a 
small target grows more rapidly than the 
brightness of a large target (cf. Fig. 11A). 

With the same apparatus a series of 
category judgments was obtained from 
a group of 15 Os. The mean category 
judgments are plotted in Fig. 11D 
against two kinds of scales. The sub- 
jective scale was determined by the line 
in Kig. 11C. Plotted against this scale, 
the category judgments (triangles) give 
a curve that is concave downward, 
essentially like the upper curve in Fig. 
11B. In both cases the high degree of 
curvature is due to the fact that the 
stimuli were spaced logarithmically, or 
approximately so. 

Plotted against decibels, the category 
scale (circles) is concave upward. In 
this respect the outcome is similar to that 
observed with loudness (Fig. 9B and 
10A) and with the lightness of gray 
papers (Fig. 12B). The degree of this 
upward curvature would increase, of 
course, if the spacing of the stimuli were 
made uniform in brils rather than in 
decibels. 

Since the category scales for neutral 
grays and for luminous spots are both 
concave upward when plotted against 
decibels, an interesting question arises: 
were the older, visuall y-determined scales 
of stellar magnitude really linear against 
decibels, or were they too concave up- 
ward? This question ought in principle 
Of course, different 
astronomers used somewhat different 
scales, so the pursuit of the answer may 
well turn out to be full of complications. 





398 


We have sampled enough of the older 
authors (Herschel, Chambers, Flam- 
steed) to see that the problem is not easy 
for those of us who are untutored in 
astronomy. Perhaps someone better 
qualified than we are should try to answer 
this question. 

Lightness of grays.—Thus far we have 
dealt with the brightness of luminous 
spots viewed against a dark surround. 
Most of our everyday judging of bright- 
ness takes place under rather different 
conditions. In a well-lighted room the 
adaptation level of the eye is higher than 
it was in the foregoing experiments, and 
the appearance of the surfaces is subject 
to simultaneous contrast. As shown by 
experiments like those of Craik (9) and 
Heinemann (21), light adaptation and 
contrast both operate to increase the 
slope of the brightness function. We 
should expect therefore that the power 
function relating subjective lightness to 
the reflectance of neutral gray surfaces 
would have a larger exponent than the 
one observed with luminous spots seen 
against a dark surround. 


We used a series of 10 neutral gray papers 
(3 <5 in. swatches from Color-Aid Company) 
covering a reflectance range of approximately 
15 db from black to white, as measured with 4 
Macbeth Illuminometer and checked against a 
Munsell Neutral Value Scale. ‘The papers were 
viewed against a grayish background of different 
appearing texture whose reflectance was about 
2.5 db below that of the lightest paper in the 
The illumination was such that the 
luminance of the whitest paper was about 80 db 
re 10” lambert. 

The first problem was to determine a ratio 
scale of subjective lightness for these papers. 
This task is less easy then the comparable 
problem with luminous spots. The apparent 
lightness of gray papers is a rather difficult thing 
to abstract and assess on a ratio scale, especially 


series. 


when some of the papers seem tinged yellowish 
or greenish. Some Os object that the task is 
impossible, but they proceed to make their 
judgments with fair consistency. In the present 
stage of this research we must regard our results 
as more illustrative than definitive. For the 
point we are concerned with here, however, the 
results are quite conclusive. 

We used three variations of the method of 
direct magnitude estimation. ‘Typically a 
particular gray was shown to O and he was told 
to call it by some arbitrary number. It was then 


S. 8. STEVENS AND E. H. GALANTER 


removed and the stimuli were presented twice 
each in irregular order. O was told to assign 
numbers proportional to the apparent lightness. 


The median results for groups of ten 
Os are shown in Fig. 12A. In one experi- 
ment, the black was presented first and 
called 1 (circles). In two other experi- 
ments a middle gray (at 72.7 db in Fig. 
12A) was presented first and called 10. 
In one of these experiments this gray 
remained continuously in view (tri- 
angles) ; in the other it was shown only at 
the beginning of each run (squares). 

The data in Fig. 12A show much ir- 
regularity, but taken all together the 
results suggest that under these viewing 
conditions the subjective lightness of a 
series of grays is roughly a power function 
of the reflectance. The exponent, esti- 
mated from the straight line in the plot, 
is 1.2, which is, as expected, much larger 
than the exponent determined by the line 
in Fig. 11A. Pending a better deter- 


mination of the nature of the scale for 
subjective lightness, we will use the line 
in Fig. 12A as the subjective magnitude 
scale against which to plot the results of 


category judgments, as represented es- 
pecially by the Munsell! Value Scale. 
Munsell Value Scale.—Perhaps the 
best known and best determined scale of 
lightness is the Munsell Scale of Value 
(36). If is commonly regarded as a 
scale of subjective magnitude, but it 
seems clear from the procedures (33) 
used to work out the relative spacing of 
the steps that the scale really belongs 
to the class we have called category 
scales. Although Newhall describes the 
direct estimation of magnitude ratios as 
a possible procedure for determining the 
spacing of the Munsell colors, he states 
that “greatest reliance was placed on the 
difference-form,” in which O judges the 
intervals between the colors (34, p. 621). 
The steps of the Munsell scale were 
adjusted to produce equal-appearing 
intervals under simultaneous viewing, 
but it also turns out that the scale pre- 
dicts well the outcome of rating-scale 
judgments under successive viewing. 
Each of the three curves in Fig. 12B 
shows the form of the Munsell scale as a 
function of the luminance produced by 





RATIO SCALES AND 


g 


LIGHTNESS OF 
NEUTRAL GRAYS 


8 


x 
°o 
—_ $$ rrr 


CATEGORY SCALES 


6 


MAGNITUDE ESTIMATION 


Stondord calied | 
Stondard called 10 
Standord called 10 


lolwoys i view) 





LUMINANCE IN DECIBELS RE 10° LAMBERT 


MUNSELL VALUE 


Fic. 12. 


estimations plotted in log-log coordinates. 


B 





Magnitude and category scales for lightness of neutral grays. A. 


CATEGORY SCALES 


NEUTRAL GRAYS 
© Black bockground 
@© Gray dockground 


White Bock ground 


(Michels & Helson) 


O S-point scote 


* 7 powt scole 


© Morker scole 








0 & 
SUBJECTIVE LIGHTNESS 
(PEL ATE BFS) 


Direct magnitude 


Curves showing the form of the Munsell Scale of 


Value; points show results of various category judgments and of judgments of relative spacing by 


means of sliding markers. C. 
line in A. 


neutral grays under ordinary illumina- 
tion. The abscissa is proportional to the 
logarithm of the reflectances specified 
by the Munsell Scale of Value. Note 
the similarity between the curves in 
Fig. 12B and 10A, in both of which 
the category scales are plotted against 
decibels. 

The Munsell Scale of Value can be 
represented quite precisely by a cube- 
root function. We determined this re- 
lation with some pains, only to discover 
that the same equation (along with 
others) had been published 9 months 
earlier (26). An equation that relates 
Munsell Value, V, to reflectance, R, 
with negligible error is given by 


V = 2.467 R$ — 1.636. 


Munsell scale plotted against the scale of brils determined by the 


This cube-root relation, we _ recall, 
is the law proposed by Plateau more 
than a hundred years ago. His was 
also a determination of equal-appearing 
intervals, and although his reasoning 
was not so firmly rooted in fact as are 
our present conceptions, it is fitting 
that the originator of these methods 
should have hit so close to the mark. 
We have already called attention to the 
curious coincidence that a cube-root 
function is approximated both by the 
category scale (Munsell Value) for 
neutral colors seen under ordinary con- 
ditions of viewing and by the magnitude 
function (brils) for luminous spots seen 
in the dark. We must emphasize that 
this apparent agreement is probably 





400 


no more than a coincidence, albeit an 
interesting one. 

The points in Fig. 12B represent 
data from three different experiments 
that may be regarded as validations 
of the Munsell scale. Newhall’s (35) 
experiment was the most thorough. 
His ten Os viewed a series of grays 
presented simultaneously and indicated 
their apparent spacing by means of 
markers on a grid. 

The points on the middle curve 
relate to two studies involving category 
judgments of the same series of 10 
papers we used to determine the mag- 
nitude function in Fig. 12A. 

‘The three sets of points near the 
top curve are of special interest. These 
were obtained from five Os who judged 
seven neutral grays, five times each, 
on a 9-point scale defined by the adjec- 
tives, “‘very, very light” to “very, very 
dark.” The different kinds of symbols 
refer to the different backgrounds against 
which the stimuli were seen. Our in- 
terest in these results stems not only 
from the fact that the different back- 
grounds were able to shift the average 
adjectival rating assigned by O under 
these conditions, but also from the 
fact that the data exhibit the same 
general curvature as the Munsell scale. 
Yet it was these data that Michels and 
Helson (29) used to test their “reformu- 
lation of the Fechner law,”’ which is a 
mathematical development of the theory 
of adaptation level. (Some of the 
defects of this theory are discussed 
elsewhere [62].) They passed straight 
lines through the three sets of points 
and concluded that Fechner’s logarithmic 
law was thereby verified. On this 
foundation they built a quantitative 


theory designed to transform “rating- 


scale 
data.” 

But two facts now seem clear: (a) 
rating-scale judgments of lightness do 
not follow Fechner’s law, which would 
be a straight line in Fig. 12B, and (4) 
even if they did, the fact would have 
no bearing on Fechner’s law, because 
category judgments do not reveal the 


data into true psychophysical 


§. 8. STEVENS AND E. H. GALANTER 


form of the ratio scale of psychological 
magnitude. 

If we now proceed to use the line 
in Fig. 12A as an approximate estimate 
of the ratio scale for the lightness of 
neutral grays, we can make this scale 
serve as the abscissa against which to 
plot the Munsell Scale of Value, as in 
Fig. 12C. Here we see that the category 
scale of apparent lightness is concave 
downward, relative to the ratio scale 
of apparent lightness. Measured by 
the ratio scale, the “equal appearing 
intervals” of the Munsell scale turn 
out not to be equal. 

In these experiments with neutral 
gray papers we have seen an instance 
in which a category scale is relatively 
easy to establish, but in which a ratio 
scale of subjective magnitude is ex- 
tremely difficult. On some of the other 
continua, such as length and duration, 
it may be easier to determine a magni- 
tude scale than a category scale. The 
various continua differ considerably in 
this respect. 

The same can be said of the practical 
usefulness of the two classes of scales. 
As a category scale the Munsell Value 
system has wide practical utility, and 
the supplying of standards for it is an 
important commercial enterprise. On 
the other hand, category scales of 
length and duration have little con- 
ceivable utility, whereas estimations of 
ratio magnitudes of these quantities 
are frequently made by many of us. 
Loudness provides an example in which 
both category scales (musical dynamics) 
and ratio scales (sones) are widely 
used for practical purposes. Some in- 
dustrial noise standards are written 
in terms of sones (1). 


Continua oF Crass II 
(METATHETIC) 


Metathetic continua are those on 
which the category scales may be 
linear. Whereas a prothetic continuum 
is characterized by the fact that dis- 
crimination varies from one end to the 
other, the distinguishing feature of a 
metathetic continuum is the uniformity 





RATIO SCALES AND CATEGORY SCALES 


of discrimination over the range. This 
uniformity of sensitivity refers to a 
constancy measured in subjective units, 
not to a constancy measured in units 
of some arbitrary stimulus scale. On 
some continua of Class II the stimulus 
difference that is just noticeable may 
vary widely, whereas the just noticeable 
difference measured in subjective units 
remains nearly constant. Pitch, meas- 
ured in mels, is a good example of a 
continuum of this sort (55). As we 
shal] see, the uniformity of sensitivity 
on continua of Class II is seldom more 
than approximate. In particular, it 
can apparently be distorted by such 
factors as landmarks and: differential 
familiarity. 

Another thing that can be said of 
Class II is that it probably includes 
the dimensions that are ordinarily called 
qualitative, as opposed to quantitative. 
It seems also to include positional 
discriminations. Judgments based on 
position actually provide some of the 
clearest examples of Class II continua. 
In short, then, metathetic or Class II 
continua seem to concern what and where, 
as opposed to how much. 

Continua based on what and where, 
comprising, as they do, many disparate 
aspects of things, constitute a_ less 
unitary and well behaved group of 
discriminations than continua based on 
how much. In a sense, the qualitative 
aspects of things have to be forced into 
scalable continua. They do not fall 
as naturally onto unitary scales as do 
the quantitative aspects of things. Psy- 
chologically speaking, size is more scal- 
able than sort. 

We realize that any attempt to 
impose a simple order on the complex 
welter of our perceptions runs the 
danger of oversimplifying what is not 
simple. The clear instances that ex- 
emplify Class I and Class II are clear- 
cut enough—like day and night—but 
this does not mean that there is no 
penumbrous twilight to dull the sharp 
edges of our classifying. Some continua 
may elude this simple scheme, for they 
may be compounded of multiple sorts 
of discrimination. 


Visual Position (Azimuth) 


A simple experiment that illustrates 
the nature of a category scale on a 
metathetic continuum is one suggested 
to us by G. A. Miller. The O sat 
before a square of white cardboard 
30 cm wide. Above the top edge of 
the cardboard a small pointer was made 
to appear for about 1 sec. and O judged 
the position of the pointer on a 15-point 
numerical scale. The pointer appeared 
in irregular order at one or another of 
29 positions spaced 1 cm. apart. 

The mean category assignments are 
shown in Fig. 13. This plot reveals 
two interesting features: (a) the category 
scale approximates a straight line, but 
(6) there is a “hook” at either end. 
These hooks result from the fact that 
the two ends of the cardboard constitute 
landmarks, in the vicinity of which 
discrimination is relatively better than 
it is elsewhere on the scale. As usual, 
the improved discrimination increases 
the local slope of the function. Without 
exception, all ten Os called the first 
position (1 cm. from the end) Category 
1 and the second position Category 2. 
The steepness of the function deter 
mined by the first two points entails 
a compensatory flattening in the vicinity 
of points 3,4, and 5. Since O's category 
scale of position is essentially linear, 
when his judgments suffer local perturba 
tions due to landmarks he tends to 
correct for the resulting distortions 
as soon as he finds himself out of the 
range of their influence. 

The essentially similar picture at the 
top end of the scale is blurred by the 
presence of an ambiguity. When in- 
structed to use a 15-point scale, some Os 
assume that Category 15 extends inward 
from the far edge of the cardboard, but 
others assume that Category 15 begins 
at the far edge and extends outward. 
This latter conception could probably 
be eliminated by more explicit instruc- 
tions. 

For making available some extensive 
unpublished data on other judgments 
of positions we are indebted to J. 


Volkmann. These experiments were 





8. 8. STEVENS AND E. H. GALANTER 


POSITION ON A LINE 


oa 





AZIMUTH POSITION 


B-pont scole 
4 6-pomt cole 
) 4-powt sole 


( Voltmone, Loti! & Hetiogg) 


CATEGORY SCALE 


CATEGORY SCALE 





AZimMUTH W DEGREES 


Fic. 13. 


conducted at Mt. Holyoke College in 
the large “cyclorama,” which consists 
of a circular screen on a 30-ft. radius 
and subtending an are of 160°. To a 
viewer located at the center of curvature, 
the field presents a minimum of land- 
marks or discontinuities. 

In one experiment by Volkmann, 
Rauscher, and Powers, four Os estimated 
the azimuth position of stimuli pre- 
sented at 27 positions ranging from 
10.5° to 49.5° to the left of straight 
ahead. Each O judged each stimulus 
45 times and the means of all the judg- 
ments are plotted in Fig. 13B. This 
procedure of estimating position in 
degrees is essentially the 
magnitude estimation which we have 
used to determine ratio scales of sub- 
jective magnitude for continua of Class 


ESTIMATED DEGREES 


method of 


AZIMUTH POSITION 


o 
to) 


8 


Estimation in degrees 


( Votamanr , Rovscher & Powers) 


20 : % —s45 
AZIMUTH IN DEGREES 


' ' , 


VISUAL INCLINATION 


6 pont scole 


( Volamann) 





40 50 


INCLINATION IN DEGREES 


Scales for position, azimuth, and inclination. 


I. The procedure applied to judgments 


of position yields a function that is 
essentially linear, although there appears 
to be a systematic tendency for O to 
overestimate the divergence of the- 
azimuth relative to straight ahead. 
This may be due to the fact that the 
stimuli were all to one side of center (zero 
azimuth). Despite this systematic error, 
the linearity of the function is clear. 

When azimuth positions are judged in 
terms of a category scale the function 
is also linear (Fig. 13C). Three Os 
judged the positions of stimuli presented 
in irregular order at 5° intervals over 
a range of 115°. Each O judged each 
stimulus 25 times (unpublished data 
of Volkmann, Latil, and Kellogg). 

When the number of categories is as 
few as four (squares) the function takes 





RATIO SCALES AND CATEGORY SCALES 


on a stepwise character. Some of this 
same effect can also be seen in Fig. 8C 
representing loudness judgments on a 
3-point scale. In Fig. 13C we find that 
suggestions of steps are also evident 
even with the larger numbers of cate- 
gories. Such steps are generally to be 
expected with a continuum on which 
discrimination is good and the number 
of stimuli greatly exceeds the number of 
categories. It is interesting to note 
that the hooks we observe at the ends 
of the scale in Fig. 13A are not evident 
in Fig. 13C. Just the opposite is the 
case: instead of starting out 
steeper slope, the curves in Fig. 13C 
start out with a flatter slope. Thus 
we see that, in the cyclorama, discrimina 
tion is not improved at the ends of the 
scale, for there are no landmarks to 
aid O in his judgment. 


with a 


Visual Inclination 


The apparent inclination of lines is 
another continuum that appears to 
belong to Class II. Rogers (45) de- 
scribed an experiment in which O 
judged the inclination of lines on a 
5-point scale, and although he did not 
publish the data themselves it is clear 
from the positions of the category 
thresholds, which he reported, that the 
category scale is essentially linear against 
angle of inclination. 

Fortunately we were able to obtain 
from J. Volkmann some of the data he 
gathered in a similar experiment con- 
ducted at Harvard in 1935. His three 
Os viewed a luminous line 10 cm. long 
at the end of a dark tunnel about 6 ft. 
long. The line was visible for 1 sec. 
at a time and was presented at 33 
angular positions ranging from 21° to 69° 
from the horizontal. Two shorter stimu- 
lus ranges were also used. The O 
was given two practice series and then 
made 25 (or more) judgments of each 
stimulus in random order. The mean 
category judgments for two stimulus 
ranges are plotted in Fig. 13D. Note 
that the stimulus ranges used do not 
include the natural landmarks that 
undoubtedly occur at 0° and 90°. 


403 


The linearity of these functions sug- 
gests that visual inclination is a meta- 
thetic continuum, but in order to test 
this conclusion we need to know that 
the magnitude scale for visual inclination 
is also linear against degrees. The 
linearity of this latter scale was amply 
demonstrated in a series of experiments 
(46) at Mt. Holyoke College, in which 
53 Os estimated the inclination (bearing) 
in degrees of a luminous line presented 
at 35 positions covering a range of 110°. 
Except for an understandable tendency 
for O to say “45 degrees” for most 
stimuli within a small range of this 
value, the magnitude function is linear. 

These Es then went on to construct a 
scale by another method, that of bi- 
section (44). They called the unit of 
inclination an enc and here again the 
scale turned out to be linear against 
the physical measure of the stimulus 
(degrees). Thus for a Class II magni 
tude of this sort we find that the method 
of magnitude estimation, in which O 
has free use of any numbers he deems 
appropriate, agrees with rating-scale 
judgments in which O is confined to a 
finite set of numbers, and with the 
method of bisection in which O adjusts 
a stimulus to produce two equal-ap- 
pearing intervals. The construction of 
an interval scale of subjective magnitude 
by the method of bisection is legitimate 
only on Class II continua, i.¢c., when 
bisection can be shown to agree with 
magnitude estimation. 


Proportion 


Another continuum that behaves in 
some respects as though it belongs to 
Class II is one called “‘color mass” 
by Philip (40) in whose experiment 
O’s task was to rate on an 11-point 
scale the number of dots of one color in 
irregular patterns made up of dots of 
two different colors. The number of 
dots of one color decreased as the number 
of another color increased. Thus one 
series ran from 13 green plus 23 blue 
to 23 green plus 13 blue. 

The mean results of several thousand 
judgments by seven Os are shown by 





S 


PROPORTION OF COLORED DOTS 


CATEGORY SCALES 


6 i] 2 6 


DOTS NUMBER OF ONE COLOR IN TOTAL OF 36 


\ 
B.—Category 
the circles in Fig. 14A. The fact that 
the average SC ale 1s approx! 
mately linear may appear at first sight 
to be inconsistent with the fact that 
the category scale for the numerousness 
of dots is concave downward (Fig. 3A). 
The explanation appears to be that 
the judgment of so-called “color mass” 
is a judgment of the relative proportions 
of two numbers of dots (blue and green, 
say) in a situation in which the total 
number of dots is always fixed at 36. 
Under 


cateyory 


these conditions we may rea 
sonably expect to find that diserimina 
tion is no better at one end of the scale 
than at the other, for it should be 
easy to discriminate 13 green dots among 
23 blue as to discriminate 13 blue dots 
among 23 green. Unlike the continuum 


of the continuum of 


as 


numerousness, 


relative proportion seems to involve no 


progressive increase in the 
of discrimination. 

It may be, of course, that on a con- 
tinuum of this kind discrimination 
through minimum near the 
middle of a scale where the dots of the 


two colors are equal in number 


difficulty 


passes a 


but 


’ 


this possibility would not produce a 


MAGNITUDE SCALE (PERCENTAGE) 


Magnitude and category 
cale for pitch 


S. STEVENS AND E. H. GALANTER 


30° T 





CATEGORY SCALE 


500 - 9400s 
516 ~ S24 cos 


Truman @ Wever) 


B | 


sj 
524 
540 


522 
530 
IN CYCLES PER SECOND 


5i¢ 518 520 
500 510 
FREQUENCY 


cal 


with closely 8} 


for judgments of proportion 


aced stimuli 


It 


ot 


function. 


kind 


the over-all 
however, 


concavity in 
might, 


s shape. 
In « 


set of cards each containing 36 dots in two colors 
The dots 
and spread 
about 8 cm. square. 
varied from 3 to 33 in steps of three 


produce a 


rder to test this possibility we made up a 


blue and green were © mm. In 


diameter were irregularly over an 


area The number of one 
col Di Two 
different stimulus cards were prepared for the 
numbers 3, 6, 30, and 33, but only one card was 
Ten Os judged 


each stimulus four times, on a 7-point scale, 


prey ared for each of the others 


twice estimating the proportion of blue and twice 
estimating the ive Os 

blue first 
viewed each card for about 


proportion of green. | 


judged first and five judged 


They 


distance of about 1 m 


green 
1 sec. from a 
An example of each of 
the extreme cards was exhibited at the beginning 
of each experiment, and O was told that all th 
cards had the same total number of dots 


As seen Fig. 14A, the expected 
S shape emerges quite In 
this test O’s task was easier than in 
Philip’s study. The end stimuli were 
$0 easily identified that only 3 out of 40 
judgments were other than 1 or 7. 
The next two stimuli were so obviously 
different from the end stimuli that they 
were Con- 


in 
clearly. 


nearly always called 2 or 6. 





RATIO SCALES AND CATEGORY SCALES 


sequently the slope of the curve is steep 
near both ends where 
is good. 


discrimination 
The steepness near the ends 
of this curve is analogous to that observed 
in Fig. 13A. For the other stimuli 
the variability increased, indicating that 
discrimination deteriorates over the cen 
tral region of the scale. And as usual, 
where discrimination is poorer the slope 
of the category scale is less steep. 
Related to the poorer discrimination 
over the middle of the range is the 
fact that O’s reaction time its longer tor 
these stimuli than tor those at the ends. 
Evidence for this fact is presented by 
Johnson (25) who experimented with 
patterns of red and black dots and by 
Lemmon (27) who measured disjunctive 
reaction time in a somewhat analogous 
experiment. Johnson stressed the find 
ing that reaction time increases in the 
vicinity of a category boundary, but 
to this should be added the principle 
that reaction time also increases when 
discrimination more difficult 
It thus seems probable that two inter 


becomes 


related factors, decision and discrimina 
tion, enter into the 
reaction time. 

Actually there is some indication that 
near the exact center of the range, 
where the dots of the two colors are 
equal in number, discrimination improves 
over what it is on either side of center. 
If this is true the curve should tend 
to assume the form of a double s, being 
steeper again near the middle stimulus 
of the series. Our stimuli were probably 
spaced too far apart to reveal this detail, 
but it might be 
explore. 


determination of 


an interesting point to 


In order to try to determine the form of th 
magnitude scale for proportions of colored dots 
we asked another ten Os simply to judge the 
percentage of blue and of green. ‘The procedure 


was identical to that described above except 
shown the end stimuli at the 
He was thus left 


free to express his judgment of proportion with 


that O was not 


beginning of the experiment. 


out any points on the scale being defined 


The mean percentages are plotted 
as squares in Fig. 14A. Although the 
curve is a little straighter than that 
produced by the 7-point scale (triangles), 


405 


it shows some of the same tendency to 
assume an § shape. Under these cir 
cumstances the psychological scale of 
proportion is not precisely linear against 
physical proportion. The general agree 
ment between the percentage scale and 
the category scale seems to confirm 
the view that this continuum is 
thetic, although there is in fact 
curvature in the category scale. 
Actually, the continuum of proportion 
is an instance of a magnitude made 
up of two segments of prothetic continua 
tied together in a complementary fash 
ion. The numerousness of one kind of 
dot increases as the 
another kind decreases. Perhaps many 
continua involving proportions are con 
stituted in this manner and obey similar 
laws. These may break 
however, if the two (or more) 


meta 


more 


numerousness of 


laws down, 


kinds ot 


stimuli juxtaposed In a proportion com 


bine to produce a new perceptual quality 


that the original 
continua no longer appear to be present 
An instance of this 
mixture of and 


or continuum, so 
sort would be a 
lights to 
Conversely, it may 
sometimes happen that when we try to 
arrange an experiment to elicit judgments 
ot a like 
stimulus may 
be accompanied by a decrease in some 
other aspect of the field, and may lead 
thereby to a judgment of proportion 
There are pitfalls in 


blue yellow 


roduce a gray. 
Z 


single continuum, length or 


area, an increase in the 


numerous this 
business 

The experiments on the judgment of 
proportion are interesting in another 
respect, for they constitute a situation 
in which “‘quantity” is fixed (at 36 dots) 
and “quality” varies in what is obviously 
a substitutive or metathetic manner, 
e.g., green dots are substituted for blue 
dots. This process provides a sugges 
tive paradigm for some types of theories 
regarding other qualitative discrimina 
tions. The perception of hue, for ex 
ample, is thought by some to depend on 
the relative proportions of different 
types of excitation, and the same can 
be said of qualities in other modalities, 
such as taste, smell, touch, ete. When- 


ever these qualities constitute scalable 





406, 


continua, we have reason to expect 
that they will behave in some ways 
like the continuum on which the different 
kinds of dots vary in their relative 
proportions. 


Pitch 


According to the place theory of 
pitch, this continuum reduces (except 
possibly for low frequencies) to one of 
position on the basilar membrane. 
Hence pitch has been regarded as the 
prototype of a substitutive discrimina- 
tion (51)—-one that should exhibit no 
“time error,” and one on which just 
noticeable differences should be sub- 
jectively equal. To these expectations 
we may now add the hypothesis that 
pitch should behave as a Class II or 
metathetic continuum as regards the 
linearity of category scales. 

First let us look at some older evidence 
obtained by Truman and Wever (70) 
in their experiment designed to deter- 
mine the DL for frequency by the 
method of single stimuli (or absolute 
judgment, as they called it). Each of 
three Os made 100 judgments, on a 
3-point scale, of each of five frequencies. 
The average category assignments made 
by each O are plotted in Fig. 14B. 
These curves should be compared to 
those in Fig. 5A. Although the pro- 
cedures were essentially the same for 
the two experiments (lifted weights 
and pitch) the category scales are 
obviously different In the case of 
pitch the middle stimulus (520 cps) 
is not assigned to a category higher 
than the middle category, i.e., there 
is no “negative time-error,” as there is 
with lifted weights. Except by one O 
to whom the tones tended to sound 
“flat’’—a not uncommon reaction among 
the musically trained when subjected 
to relatively pure tones—the middle 
stimulus was assigned almost precisely 
to the middle category. 

The other important feature of Fig. 
14B is that the curves are not concave 
downward. In this respect again they 
differ from the example of a Class I 
continuum shown in Fig. 5A. 


S. S. STEVENS AND E 


H. GALANTER 


When we turn to wider ranges of 
frequency we become involved with the 
mel scale of subjective pitch (66). In 
its revised version (65) this scale has 
been used fairly widely and has been 
shown to bear a suggestively close rela- 
tion to the positions at which various 
frequencies activate the organ of Corti. 

The original mel scale was constructed 
by the method of fractionation, but for 
the revised version both this method 
and the method of equisection were 
used. As the authors pointed out at 
the time, there is some question whether 
the method of fractionation, when ap- 
plied to a continuum of this sort, is not 
really a version of bisection. It is not 
unreasonable to regard it in this way, 
for there is a serious question whether 
ratios mean the same thing on a meta- 
thetic continuum as they do on a 
prothetic continuum. A defensible view 
is that what we have called magnitude 
scales on Class II continua are never 
more than interval scales, and that ratio 
scales are possible only on continua of 
Class I. Perhaps we ought not to use 
the term magnitude for subjective scales 
on Class II continua, but an alternative 
term has not suggested itself. ' 

On the other hand, differences on in- 
terval scales may be regarded as ratio 
magnitudes—measurable on ratio scales 
(53). Consequently, whenever a psy- 
chological interval-scale has some end 
point, artificial or natural, like the lowest 
pitch one can hear, it may be possible to 
judge apparent “distances” from the end 
point in terms of a ratio scale. In this 
sense it may not be unreasonable to ask 
O to set one tone to a pitch that sounds 
half as high as another (fractionation) or 
to make a direct estimation of pitch 
magnitude. 

Two questions then present them. 
selves: (a) Is it possible by the method of 
direct magnitude estimation to obtain a 
function that agrees with the mel scale? 
(4) Are category judgments linear against 
the mel scale? The answer to both these 
questions appears to be positive, but 


both present certain interesting problems 
.and difficulties. 





RATIO SCALES AND CATEGORY SCALES 


PITCH IN MELS 


-~-=-= Original mel scole 
— Revised me! scole 


Median magnitude estimate 
ond mterquortile range 
; 


A 


4 + ab deed 
$000 10000 





+ 4b + soae 4 - 


“100 800 “$00 “1600 F000 
FREQUENCY IN CYCLES PER SECOND 


r Y 





0 3000 - 0,000 cops 


yn 
ws 
4 
Z 
5 
ws 
r 
a 
re) 


200 - 2000 cops 
4 200-2050 cps 
(me! spacing) 


00 1206 1600 “ 2600 ~ 2400 ~ 

PITCH IN MELS 
Fie. 15. 
in log-log coordinates; B, C, D. 


\n answer to the first question is contained 
in the results of an experiment in which 20 Os 
made direct magnitude estimations of the pitches 
of a series of tones of medium loudness. Fach 
tone was presented twice and the order was 
different for each O. No standard was used, 
which means that O gave the first tone presented 
a number of his own choosing, and then tried 
to assign to the other tones numbers reflecting 
the relative pitches of the tones. In order to 
take medians of the resulting data we multiplied 
each O's estimates by whatever factor was 
necessary to make the estimate for 1000 cps 
equal to the value 1000. 


The magnitude estimation of pitch is 
apparently a difficult judgment, and 
many of the Os registered some degree otf 





Magnitude and category scales for pitch: A. 


4 wc 





CATEGORY SCALE 


7400 cps '2-powt scole 
100 - 9000 cps 1~ pot ecete 


0 70- $800 ces 7~ port scole i 
| 





00 560-00 2000 S600 6008 “9500 
FREQUENCY IN CYCLES PER SECOND 
300 1060 --1800- 8000 3500—“S000 

PITCH IN MELS 


Tv a as 





CATEGORY SCALE 


werend | 


voor ena | 
D 


i i ‘ i ‘ ‘ i i ‘ i 4 | 
20 4% 7 1900 sw Te.) 2120 
FREQUENCY IN CYCLES PER SECOND 

; 400 ~ 600 ~ 600 ~ 1606 ~ 1200 400 — 1600 

PITCH IN MELS 


direct magnitude estimations plotted 


category judgments plotted against mels (revised scale), 


protest at having to make it. Never- 
theless, the results in Fig. 15A are inter 
esting, even though highly variable 

For the most part, the magnitude 
estimates lie between the two mel scales, 
perhaps with a tendency to fall closer to 
the original scale. This outcome is not 
unreasonable when we recall that the 
original scale was constructed with ap 
paratus that did not permit the presenta 
tion of a frequency low enough to be 
close to the bottom of the pitch scale. 
For the revised scale, O was allowed to 
hear a tone of 40 cps in order to acquaint 
himself with what a really low pitch 
sounds like. Hearing this low frequency 
appears to have altered O's conception of 





40k. 


the pitch scale. Since the magnitude 
estimations in the present experiment 
were made without benefit of a low- 
frequency orienting tone, it 1s reasonable 
that the typical O’s conception of the 
pitch 
original 


resemble the 
scale. In any case, the 
magnitude estimates, variable and diffi 
they follow the 
general form of the scales constructed by 
other methods. 


continuum should 


mel 
cult as 


are, seem to 


Our attempts to construct a category 
scale for pitch reveal an interesting com 
plication in this continuum. Appar 
ently the fact that the typical O is very 
familiar with one end of the range (the 
musical scale) and very unfamiliar with 
the other end (tones above 3000 or 4000 
cps) makes it impossible to obtain a 

‘linear category scale covering the entire 
We made three different 
attempts to do it, with the results shown 
in big. 15B, where the mean category 
plotted 


continuum. 


judgments are mels 
(revised scale). 

The 
downward, although some of the curves 
The 
reason seems to be that, as one O said, 
“All those high notes sound alike——I’ ve 
never heard those high pitchless tones 
It is altogether plausible that 
discrimination tends to be rather poor 
on that part of a continuum with which 
one is not familiar. It is consonant with 
the common observation that all Chinese 
look much alike to Westerners. b'urther- 
more, it is to be expected that the poorer 
discrimination over the upper part of the 
pitch continuum would flatten the slope 
of the upper end of the category scale. 

This argument suggests, of course, that 
linear scales might be achieved if the two 
halves of the continuum were tested 
separately. We therefore divided the 
frequency range into two segments. The 
results in big. 15C show that category 
scales that are linear against mels can be 
obtained from the two segments of the 
continuum taken separately. The upper 
segment gives data that are more vari- 
able than the lower segment. The 
average range of the category assign- 
ments was 2.3 for the lower segment 


ayainst 


category scales are all concave 


seem to show evidence of breaks. 


before.” 


S. 8. STEVENS AND E 


H. GALANTER 


(circles) and 3.1 for the upper segment 
(squares). This is consistent with the 
feeling expressed by some Os that in 
judging the high frequencies they were 
adrift in uncharted waters. 

The triangles in Fig. 15C represent 
another test for which the spacing of the 
stimuli was uniform (140 mels). For the 
first test (circles) the spacing was not 
precisely uniform. Apparently these 
two spacings were not sufficiently differ- 
ent to make much difference in the 
shapes of the curves. In these curves we 
observe a slight tendency for the func 
tions to become s shaped. It is as 
though the end stimuli, which were 
sounded at the beginning of each run, 
served as landmarks and produced an 
effect not unlike that shown in Fig. 13A 
and 14A (triangles). 

When the relative spacing of the 
stimuli is made extremely nonuniform by 
bunching the tones close together at the 
low or high end of the scale we obtain the 
expected steepening of the slope over the 
region of bunching. ‘The results of three 
illustrative experiments are shown in Fig. 
15D. It is clear from these curves that 
it is easier to make the category scale be 
come concave downward (plotted against 
mels) than it is to make it become con 
caveupward. Whyisthisso? The best 
we can suggest is that, in this frequency 
range, ..tegory judgments of pitch tend 
to be influenced by the nature of the 
musical scale, on which the steps are 
proportional to the logarithm of the 
frequency. A few subjects say quite 
explicitly that they try to base their con 
ception of the categories on musical 
intervals. When the mel separation of 
the stimuli is relatively close at the low 
end of the scale, and the spacing there- 
fore resembles musical spacing, the effect 
is to enhance O’s tendency to judge in 
terms of musical steps rather than pitch 
distance, which in turn makes the cate- 
gory highly concave 
when plotted against mels. 


scale downward 
The bunch 
ing of the stimuli at the high end of the 
scale produces an upward concavity, but 


the curvature is not very great, presum- 


ably because it is counteracted by the 





RATIO SCALES AND 


tendency of some Os to judge in terms of 
musical steps. 

One further point deserves mention. 
Although the pitch continuum appears 
to belong to Class II there is a possibility 
that the low end (frequencies below 200 
cps?) may belong to Class I. It has 
often been argued that the perception of 
pitch may be mediated by “place” at 
high frequencies and by “periodicity” at 
low frequencies. It is possible therefore 
that the metathetic process at the high 
frequencies may give way to a prothetic 
process at the low end of the continuum. 
We have no direct evidence to back this 
conjecture, but it interesting 


1s an 


possibility. 


SUMMARY 


scales of subjective magnitude 


Ratio 


compared with category rating-scales on several 


are 


These continua divide 
Class I (prothetic) 
continua are those on which the category scale 


judgmental continua. 


themselves into two classes. 


is nonlinear (concave downward) relative to the 
magnitude scale. Examples are apparent length, 
duration, numerousness, area, weight, loudness, 
On of 


continua the ratio scale of subjective magnitude 


brightness and lightness. each these 
approximates a power function of the physical 
The category scale, however, is con 
to the 


principally because discrimination is relatively 


stimulus. 


cave downward relative ratio scale, 
better at one end of the continuum than at the 
other In the 
curved than logarithmic 

On Class II (metathetic) 


category scale may be linearly 


general, category scale is less 
continua the 
related to the 
scale. Examples are visual position, 
pr pitch. On 


nua discrimination (expressed in subjective 


magnitude 


inclinati m, yportion, and these 


over the 
differential 


rmities 


units) tends to be constant 
though landmark 
introduce n 


range, al 
and familiarity 
munif 


On both t pes of continua the form of the 


may 
category scale depends on the relative spacing 
of the stimuli and on the relative frequency of 
their presentation. By means of an iterative 
procedure, the effects of spacing and frequenc y 
can be neutralized and the “pure” form of the 
category scale obtained 


The form 


independent of the number of categones em 


of the category sca’ ‘s generally 


ployed, and, except under special circumstances, 


it is independent of the and number of 


used 


ranyve 


stimull 


CATEGORY SCALES 409 


the ratio scale of 


unatiected b 


Unlike the categor) cale 
subjective magnitude is relativel 
stimulus spacing 


Although on 


appearing intervals’ may be equal, on prothetic 


“equal 


metathetic continua 
continua such intervals are not equal in terms of 


scales of subjective magnitude. 


REFERENCES 


Apes, D. C. The AMA 125 
vehicle noise specification Ni 
1956, 2 (3), 13 

Arons, L., & Inwin, F. W 
and psychophysical judgment 
Psychol., 1932, 15, 733-751 

Baker, K. E., & F. J. We 
8c ales from ratio judgments and « mipatr 

scales / 


308 


sone view 


ise Contr 


qual weights 


/ ex 
Dupek, 


sons of existent weight 
Psychol.., 1955, 50, 293 
Beranek, L. | Criteria for 
based on questionnaire rating 
J] acoust. Soe Amer . 1956, 28, 433 852 
SJrown, DD. R Stimulus similarity and the 
anchoring of subjective fmer. J] 
Psychol., 1953, 66, 199-214 
Catiaway, DB Measurement and evalu 
of exhaust noise of over-the-road 
Trans. Soe 1954, 


exp 


othee quieting 


studies 


scale 


ation 
trucks 
151-162. 
Canter, R. R 
mental c 


automot. Engr 


\n exper 
ycho 
P 


, & Hiesen, | 
mparison of several | 
logical scales of weight Imer ] 
chol., 1955, 68, 645-049 
Cowprick, M The Weber-Fechner 
and Sanford’s weight experiment 
J. Psychol., Y917, 28, 585--588 
CRAIK, K J. W The effect of adaptation 
on subjective brightness Proc. re S 
1940, B128, 232-247 
De.sorur, J. Etude prych 
Feanpercer, S. W. On abs 
relative judgments in lifted 


J. Psychol 


law 
Ime 


nh 


periment Imer 
S60-578 

Garner, W. R 
of absolute 
] exp Psychol., 1953, 46, 37 

Garner, W. R A technique } 
for loudness irement ] 
Soc Imer., 1954 26, 73-8 

(REGG, I W Fract 
intervals 
307-312 

. GUILFORD, 
(Ist ed 
19%. 

CJUILFORD, 
(2nd ed y York 
1954 


An informational analysi 


INness 


judgments ot | 
. 


320) 
cale 
4 t 


and a 
meas aco 
f temporal 


1951, 42, 


yvohometriu methods 
ok McGraw Hill, 
Psychometric method 


McGraw Hil 





410 


17. Guitrorp, J. P., & Dincman, H. F. A 
validation study of ratio-judgment meth 
ods Amer J. Psychol., 1954, 67, 95 
410 

Guitroxp, |. P., & Dinocmax, H. F. A 
modification of the method of equal-ap 


pearing intervals. Amer. |. Psychol., 
1955, 68, 450-454 

Hanes, R. M 
jective brightness scales from fraction 
ation data: a validation. J. exp. Psychol, 
1949, 39, 719-728 

Harrer, R.S., & Stevens, $.S. A psycho 
logical scale of weight and a formula for 


Amer. |. Psychol. V9A48, 


The construction of sub 


its derivation 
61, 343-351 

Hemvemann, Kk. G. Simultaneous bright 
ness induction as a function of inducing 
and test-field luminances. J. exp. Psy 
chol., 1955, 50, 89-96 

Hetmnourz, H. v. Physiological optics 
Vol. IL. (J. P. C. Southall, Ed.) New 
York: Optical Society of America, 1924 
P. 176 

Hetson, H 
for a quantitative theory of frames of 


Psyc hol Rev » 1948, 55, 297 


\daptation level as a_ basis 


reference 
313 
Hottincwortn, H..L. The inaccuracy of 
movement Irch. Psychol., N. Y., V9O9, 
No 13 
Jounson, D. M 


and judgement 


The psychology of thought 
New York: Harper, 1955 
Lapp, J. H., & Pinney, J. BE. Empirical 
relationships with the Munsell value 
scale. Proc. Inst. rad. Engrs, 1955, 
43 (9), 1137 
Lemmon, V. W 
time to measures of intelligence, memory 
and learning Irch P sychol., A Y ’ 
1927, No. 94 
Micue rs, W. C., & Doser, B. T. Rating 


scale method for comparative loudness 


The relation of reaction 


measurements. /. acoust. Soc. Amer., 
1955, 27, 1173-1180 

Micne rs, W. C., & Hetson, H. A re 
formulation of the Fechner law in terms 
of adaptation-level applied to rating 
scale data Imer J Psychol., 1949, 62, 
355~—368. 

Mitver, G. A 
plus or minus two: some limits on our 
capacity for processing information 
Psychol. Reo., 1956, 63, 81-97. 

. Moon, P. Photometrics in astronomy 
J. Franklin Inst., 1954, 258, 461-471 
Nasu, M. ¢ \n experimental test of the 
Michels-Helson theory of judgment 
Imer. |. Peychol., V950, 63, 214-220 


The magical nuraber seven, 


. Parpuces, A 


. Postman, L., & Miiver, G. A. 


. Prestox, M. G 


. Ross, S., & Katcumar, L. 


S. $8. STEVENS AND E, H. GALANTER 


Newunatt, §. M. The ratio method in the 
review of the Munsell colors. Amer. J. 
Psychol., 1939, 52, 394-405 

Newuatt, 8. M. Preliminary report of the 
OSA Subcommittee on the Spacing of the 
Munsell-colors. J. opt. Soc. Amer., 1940, 
30, 617-645. 

Newnan, §. M. A method of evaluating 
the spacing of visual scales. Amer. J. 
Psychol., 1950, 63, 221-228 

Newna.t, 8. M., Nickerson, D., & Jupp, 
D. B. Final report of the OSA Sub- 
committee on the Spacing of the Munsell 
colors. jJ. opt. Soc. Amer., 1943, 33 
385-418. 

Newman, E. B., Vo_txmann, J., & STEVENS, 
S.S. On the method of bisection and its 
relation to a loudness scale. Amer. J 
Psychol., 1937, 49, 134-137 

Parpucct, A. Direction of shift in the 
judgment of single stimuli. J. ex 
Psychol., 1956, 51, 169-178 

Incidental learning of stim 
ulus frequencies in the establishment of 
judgment scales. J. exp. Psychol., 1956, 
52, 112-118. 

Pur, B. R. Generalization and central 
tendency in the discrimination of a series 
of stimuli. Canad. J. Psychol., 1947, 1, 
196-204 

Anchoring 

of temporal judgments. Amer. J. Pry 
chol., 1945, 58, 43-53. 

Preston, M. G. Contrast effects and the 
psychophysical judgments. Amer. J 
Psychol , 1936, 48, 389-402 

Contrast effects and the 

psychornetric functions. Amer. J. Pry 


chol., 1936, 48, 625-631 


. Reese, E. P. (Ed.) Summary Report 


from the Psychophysical Research Unit, 
Mt. Holyoke College. U S Navy, 
Special Devices Center, Tech. Rep. No 
SDC 131-1-5, 1946-1952. 


5. Rocers, $. The anchoring of absolute 


judgments. Arch. Psychol., N. Y., V9A41, 
No. 261. 


. Rocers, §S., Votxmann, J., Reese, T. W., 


& Kaurman, FE. L. Accuracy and vari 
ability of direct estimates of bearing from 
large display screens. Mt. Holyoke 
College, Rep. No. 166-1-MHC 1, May 
1947. 

The construc 
tion of a magnitude function for short 
time intervals. Amer. J. Psychol., 1951, 
64, 397-401 

Seasnore, C. EF. Psychology of musi 
New York: McGraw-Hill, 1938. P. 89 





RATIO SCALES AND CATEGORY SCALES 411 


. Stevens, J. C., & Tutvinc, E. Loudness 
estimation iti a group situation. Amer. J. 
Psychol. (in press). 


50. Srevens,S.S. A scale for the measurement 


of a psychological magnitude: loudness. 
Psychol. Reo., 1936, 43, 405-416, 

. Stevens, S. S. On the problem of scales 
for the measurement of psychological 
magnitudes. J. unified Sci., 1939, 9, 
94-99, 

. Stevens, S. S. On the theory of scales of 
measurement. Science, 1946, 103, 677 
680. 

. Stevens, S.S. Mathematics, measurement 
and psychophysics. In S. S. Stevens 
(Ed.), Handbook of experimental psy- 
chology. New York: Wiley, 1951. Pp. 
1-49. 

. Stevens, S. S. Biological transducers. 
Convention Rec., Inst. Radio Engrs, 1954, 
Part 9, 27-33. 


5. Srevens, 8. S. Pitch discrimination, mels, 


and Kock’s contention. J. acoust. Soc. 


Amer., 1954, 26, 1075-1077. 


56. Stevens, $.S. Decibels of light and sound. 


Physics Today, 1955, 8, 12-17. 

. Srevens, 8. S. On the averaging of data. 
Science, 1955, 121, 113-116. 

. Srevens, 8. S. The measurement of loud- 
ness. J. acoust. Soc. Amer., 1955, 27, 
815-829. 

. Stevens, S. S. The direct estimation of 
sensory magnitudes—loudness. Amer. J. 
Psychol, 1956, 69, 1-25. 

. Stevens, S. S. The calculation of the 
loudness of complex noise. J. acoust. 
Soc. Amer., 1956, 28, 807-832. 


1. Stevens, S$. S. On the psychophysical law. 


Psychol. Rev., 1957, 64, 153-181. 

. Srevens, S. S. Adaptation level vs. the 
relativity of judgment. Amer. J. Psy- 
chol. (in press). 

. Srevens, S. S., & Davis, H. Hearing: its 
psychology and physiology. New York: 
Wiley, 1938. 


. Tuurstone, L, L. 


64. Stevens, S. S., & Poutron, FE. C. The 


estimation of loudness by unpracticed 
observers. J. exp. Psychol., 1956, $1, 
71-78. 


. Stevens, S. S., & Votxmann, J. The 


relation of pitch to frequency: a revised 
scale. Amer. J. Psychol., 1940, 53, 
329-353. 


. Stevens, S. S., Votxmann, J., & Newman, 


E. B. A scale for the measurement of the 
psychological magnitude pitch J. 
acoust. Soc. Amer., 1937, 8, 185-190. 


. Taves, E. H. Two mechanisms for the 


perception of visual numerousness. Arch. 
Psychol., N. Y., V941, 37, No. 265 

Fechner’s law and the 
method of equal-appearing intervals. 


J. exp. Psychol. 1929, 12, 214-224 


. Trrcenener, E. B. Experimental psychology 


Vol. Il, Part Hl. (Instructor's Manual). 
New York: Macmillan, 1923. P. Ixix. 


. Truman, S. R., & Wever, FE. G. The 


judgment of pitch as a function of the 
series. Univer. Calif. Publ. Psychol, 
1928, 3, 215-223. 


. Urpan, F. M. The Weber-Fechner law and 


mental measurement. J. exp. Psychol., 
1933, 16, 221-236, 


. Vorxmann, J., Hunt, W. A., & McGourry, 


M. Variability of judgment as a function 
of stimulus density. Amer. J. Prychol., 
1940, 53, 277-284. 


. Warren, R. M., & Warren, R. P. Effect 


of the relative volume of standard and 
comparison object on half-heaviness 
judgments. Amer. J. Psychol, 1956, 69, 
640-643. 


. Wever, E.G., & Zener, K. FE. The method 


of absolute judgment in psychophysics. 
Psychol. Rev., 1928, 35, 466-493. 

5. Wooprow, H. Weight-discrimination with 
a varying standard. Amer. J. Psychol., 
1933, 45, 391-416. 


(Received September 16, 1956) 





Journal of Experimental Psychology 
Vol. 54, No. 6, 1957 


INTELLIGIBILITY AS A FUNCTION OF 
FREQUENCY OF USAGE! 


MARK R. ROSENZWEIG AND LEO POSTMAN 


University of California 


Frequency of usage of words has 
been demonstrated to be the most 
important factor determining thres- 
holds of visual recognition of indi- 
vidual words in tachistoscopic tests 
(5). There are some indications in 
the literature that frequency of usage 
may also be an important factor 
in determining intelligibility of words 
in auditory tests. The results ob- 
tained to date in audition do not 
seem conclusive, however, so we 
have attacked this question directly 
in two experiments, one in English? 
and the other in French.’ 

Recognition of the possible role of 
frequency was shown in the prepara- 
‘ tion of the PB (phonetically balanced) 


monosyllabic word lists by the Psycho- 


Acoustic Laboratory (2). Words of 
an original list were rated for famili- 
arity, and the less familiar items were 
eliminated from the final list. Black 
(1) in 1952 determined intelligibility 
scores for more than three thousand 
words, and he investigated a number 
of correlates of intelligibility. He 
concluded that intelligibility scores 
varied with frequency of usage.*. He 


'The main results of this research were 
presented in a paper read at the Third Inter- 
national Congress on Acoustics, Cambridge, 
Massachusetts, June 19, 1956. 

2 We wish to thank Mrs. Sheila Walsh for her aid 
in collecting the data of the English experiment. 

*The French experiment was done by the 
first author at the Sorbonne during a sabbatical 
semester. We wish to thank Professeur Paul 
Fraisse, Directeur du Laboratoire de Psychologie 
Expérimentale, without whose generous aid 
and hospitality this research could not have 
been done. 

*The frequency count used by Black was 
the 1931 Thorndike list (15) which is weighted 
heavily with children’s books. 


also confirmed previous findings that 
longer words tend both to be more 
intelligible and to occur less fre- 
quently in the language than do 
shorter words. ‘Thus two contrary 
influences, word familiarity and word 
complexity, appear to operate in the 
auditory recognition of words some- 
what independently of the phonetic 
content” (1, p. 417). Black’s results 
are limited by the fact that he ex- 
cluded from his experiment words 
which were found in pretests to be 
more than 80% or less than 20% 
intelligible. Howes (4) in a pre- 
liminary study reanalyzed the results 
of Mason and Garrison (8) on the 
intelligibility of three-word sentences. 
He correlated the intelligibility scores 
of the sentences with the log fre- 
quency of usage of the least frequent 
word in each sentence. The coef- 
ficient of correlation was .68, indicat- 
ing that the frequency of usage of 
the least common word accounted 
for about half of the variance in 
sentence thresholds. 


Mertnop 
English Experiment 


Test materials —The words used in the 
English experiment are given in Table 1. In 
order to control for effects of word length, only 
monosyllabic words were used. The frequency 
of usage of each word was determined from the 
Lorge Magazine Count of 4,500,000 words 
(16), since we considered that this count would 
best approximate oral usage. Words were 
selected in four ranges of frequency: 1-3, 10-33, 
100-330, and 1000-3300. The words were 
drawn at random from the Psycho-Acoustic 
Laboratory PB lists until eight words had been 
selected for each of the three highest frequency 
categories. A few words were rejected in 


412 





INTELLIGIBILITY AND FREQUENCY OF USAGE 


order to avoid too great similarity among 
items. After the experiment had been run, 
we discovered that, due to a clerical error, seven 
words had been included in the 10-33 range 
and nine words in the 100-330 range. There- 
fore the scores for these ranges were weighted 
appropriately in the statistical analyses. Since 
most low-frequency words had been eliminated 
from the PB lists during their construction, we 
obtained monosyllables of frequencies 1-3 by 
drawing at random from the Lorge Magazine 
Count. The 32 words were arranged in seven 
randomly ordered lists. A different additional 
word was placed at the beginning of each list 
to act as a buffer word in the test. The lists 
were then tape recorded. Fach word was 
preceded by a number and by the carrier phrase, 
“You will write.” The speaker monitored his 
voice level on the carrier phrase. Immediately 
after the number of the word was given, white 
noise was introduced into the recorder, masking 
the carrier phrase and the test word. The first 
list was recorded at a signal-to-noise ratio of 
—12 db. Each successive list was masked by 
4 db less noise, so that the lists became progres- 
sively more intelligible. ‘The seventh list was 
recorded in the qui The words were given 
at 10-sec. intervals; the lists were separated by 
30-sec. intervals. 

Subjects.—The Ss were 109 students in sum- 
mer session courses in psychology at the Uni- 
versity of California. ‘They were run in groups 
of 15 to 20. They did not know the purpose of 
the experiment. 

Procedure.—The Ss were told that they 
would hear words in the presence of noise, and 
they were instructed to write in the appropriate 
space on their answer sheets whatever they 
heard at each stimulus presentation. They 
could write a word, a part of a word, or nothing 
at all. Before the test, examples were given 
of two words masked by rather intense noise 
and of the same words masked by rather weak 
noise. A separate answer sheet was provided 
for each of the seven lists. At the end of the 
experiment, S was instructed to turn over his 
booklet of answer sheets and to write on the 
back sheet all of the words that he could re- 
member from the test. This procedure was 
added because we were interested in the effects 
on recall of the initial familiarity of the items 
and of the frequency of recognition during the 
threshold experiment. Five minutes were al- 
lowed for the recall test. 

Following the recall test, S was instructed 
to write the name of his native language on his 
booklet. Eighty-seven Ss gave English as 
their native language; we shall refer to this 
group as the English Group. ‘Twelve gave some 
other language; this group will be designated 


TABLE 1 
Stimu.us Worps anno THeir 


Frequencies or Usace 


English Experiment French Experiment 


Frequ 
4,500. 


a | 


| Freque ney per 


Word $12,000 


Word 


| 
Flange a 
Prism 
Dram | 
Larch | 
Thrall | 
Tithe 
Cull 
Hoot 


Hernie 
Notice 


_| 

| 

=F Guidon 
| Pliant 


Wwe bi 


Cud 
Jab 
Foe 
Pelt 
Pew 
Gem 


Hash 


Visage 

Bouton 
Cabine 
Niveau 


Cc heat 
Ridge 
Rouse 
Lump 
Apt 
Barn 
Curve 
Bean 


Ray 


Besoin 
Malade 
Dedans 
Argent 





Age Enfant 
| —— 
etite 


Moment 











the Foreign Group. Ten gave both English 
and another language; this group will be called 
the Bilingual Group. 

Scoring procedure.—To determine thresholds, 
the first correct transcription of each word was 
scored for each S Thus, for each S, the word 
was given a score that ranged from 1 to 7, the 
number being that of the trial on which the 
word was first reported correctly. If S never 
reported the word correctly, it was scored 8. 
In addition, each response was assigned to one 
of four categories: C—a 
M-—a meaningful word, but an_ incorrect 
response; N—a nonsense response, i.e., a letter 
or sequence of letters without dictionary mean- 
ing; B—a blank, i.e., 


correct response; 


a failure to respond. 





MARK R. ROSENZWEIG AND LEO POSTMAN 


TABLE 2 


Scores or Recocnirion ror THe Four Frequency Rances or Strimutt 


English Experiment 


SD | Mean SD 


English 


Foreign 


52 | 610 | .52 
Bilingual : 40 | 6.16 71 
| 6,84 70 


Frequency Ranges 


10-33 100-330 


Mean | SD 


41.41 | 4.73 
5.54 54- 
5.87 71 


French Experiment 


Mean 


French 


6.72 
* Liberal scoring of response to Petite. 


French Experiment 


Test materials.—The French experiment was 
similar to the English one but was less extensive. 
The words used are given in Table 1. They are 
disyllabic words, six letters long, selected from 
the count of oral usage made for Le Francais 
Elémentaire (3). The French count is based 
on only 312,000 running words, and we could 
use a range of only two and one-half log units 
of frequency for the.experiment. The words 
were chosen to have frequencies as close as 
possible to 1, 10, 100, and 330, while avoiding 
undue among items. Four other 
words were used as fillers. ‘The 20 words were 
arranged in randomly ordered lists. 
Starting with a signal-to-noise ratio of —12 db 
for the first list, the masking noise was reduced 
by 3 db for each successive list; the last list was 
recorded in the quiet. The word Ecrivez was 
used as the carrier phrase. ‘The recording was 
made by a professional French actor at the 


similarity 


seven 


*'This word count, then unpublished, was 
made available to us by Professeur G. Gougen- 
heim, Directeur du Centre d’Etude du Frangais 
Elémentaire. We are grateful to him and to 
Professeur P. Rivenc, Directeur-adjoint, for 
helping us to use the materials prepared by the 
Centre. 


Frequency Ranges 


| Mean SD 


5.48 58 
(5.28)*| (.59)* 


studios of the Centre d’Etudes de Radio-Télé 
vision in Paris.® 

In order to present the test to the main group 
of Ss at the Sorbonne, it was necessary to re- 
record the tape, and some noise and distortion 
were unfortunately introduced at this step. 
A preliminary test had been made with six 
Ss, using the original tape in the recording 
studio. Comparison of their results with those 
of the main group show that the re-recording 
raised thresholds by about 5 db. 

Subjects.—The Ss were students in a course 
in experimental psychology at the Sorbonne. 
They were run in a single group. The Ss did 
not know the purpose of the experiment. The 
answer sheets were scored for only those stu- 
dents whose native language was French—a 
total of 60 Ss. 

Procedure.—The instructions—similar to 
those in the English experiment—were recorded 
by the speaker, and examples of words masked 
by noise were also given before the test. No 
recall test was given to this group. The scoring 
procedure was the same as for the English 
experiment. 

* We are grateful to Monsieur Jean Tardieu, 
Director du Centre d’Etudes de Radio-Télé- 
vision, at whose studios the recording was made. 





INTELLIGIBILITY AND 


RESULTS 
Thresholds 


English experiment.—Thresholds 
were computed separately for the 
English Group, the Bilingual Group, 
and the Foreign Group. The sum 
of the thresholds for all of the words 
of a frequency range was determined 
for each S. The mean _ threshold 
per S per word for each frequency 
range is presented in Table 2. Let 
us consider first the English Group 
which formed the bulk of the Ss. 
Their thresholds decrease steadily 
with increasing frequency of usage. 
An analysis of variance of their 
threshold scores showed that the 
differences among words of the four 
frequency ranges were highly signi- 
ficant (3 and 258 df; F = 503.67; 
P < OO1). Using Tukey’s gap test 
(9), we found that each of the suc- 
cessive differences was significant. 

The thresholds of the Bilingual 
and Foreign Groups are higher than 
those of the English Group, but 
the relationships between thresholds 
and word frequency are similar in 
all groups. An analysis of variance 
was made to compare the three 
groups.’ 


The variance due to groups 
of Ss was highly significant (2 and 


106 df; F = 13.76; P < .OO1). The 
interaction between groups and fre- 
frequency ranges fell short of sig- 
nificance. In order to test the sig- 
nificance of the differences between 
individual group means, we used 
the method of allowances (cf. 9, 
pp. 304-307). The difference be- 
tween the English and the Foreign 
Groups was significant beyond the 
.O1 level; between the Bilingual and 
Foreign Groups, at the .O1 level. 


7™The numbers of Ss in the three groups 
are, of quite different. However, 
since the subclass numbers are proportional, it is 
possible to perform a conventional analysis 
of variance (see Snedecor [14, p. 281). 


course, 


FREQUENCY OF USAGE 

















French Word Frequency 


Fic. 1. Thresholds of individual words 
versus their frequencies of plotted 
separately for the English and French experi 
ments. 


usage, 


The difference between the English and 
Bilingual Groups was not significant. 

The thresholds of native speakers 
for all words are shown in Fig. 1. 
In the English experiment the least 
frequent words were perceived cor- 
rectly, on the average, only on the 
last trial. The most frequent words 
were perceived, on the average, on 
about the fourth trial. Thus it ap- 
pears that the most frequent words 
could be perceived, on the average, 
through about 12 db more noise than 
could the least frequent words, though 
it must be recognized that the 
sequential nature of the trials may 
have influenced this result. The re- 
lation between threshold and log 
frequency is approximately linear. 
The correlation between threshold 
and log frequency is —.78.* It may 
be interesting to consider only the 
words drawn from the Psycho-Acous- 


* This correlation is similar 
ranging from —.68 to —.75 reported by Howes 
and Solomon (5) between frequency of usage and 
visual thresholds 


to correlations 





MARK R. ROSENZWEIG AND LEO POSTMAN 





Percent Responses 


Cc 











i 1 Ls 4 4 Lat 


4 





4 








4 4 i Lo i i. re 4 





19) +} us 7 “wT 
"12°86 "4 044486 Q°12°6 4 07448 Q°12°6 4 044 +68 Q°12°86 4 044 +8 Q 








° 


Signal -to-Noise Ratio in Decibels 


Fic. 2. 


Percentages of responses in the English experiment falling into each of the four response 
categories, graphed separately for each frequency range of stimuli. 


Each curve represents the 


sum of all the categories whose initials lie below the curve. 


tic Laboratory PB lists (the three 
categories of highest frequency), since 
these lists are phonetically balanced 
and are used widely in articulation 
testing. This selection from the ex- 


perimental series of course restricts 


the range and lowers the correlation. 
in this case the correlation becomes 
—.62; it is significantly different 
from O beyond the .O1 level. 

French experiment.—The sum of 
the thresholds for the words of each 
frequency range was determined for 
each S. The mean thresholds per S 
per word for the four frequency 
ranges are presented in Table 2. 
For the first three frequency ranges 
the thresholds in the French experi- 
ment are similar to those in the 
English experiment, and there is a 
regular decline of threshold with 
frequency. For the words of highest 
frequency, however, there is a small 
increase of threshold over that of 
the third range. It will be noted 
that two mean thresholds are pre- 
sented for the highest frequency 
range. ‘These thresholds differ only 
because of the scoring of one word, 
the adjective Petite which many 


Ss first heard as Petit. The lower 
threshold is based on a more liberal 
scoring in which either the feminine 
or the musculine form was counted 
as correct. An analysis of variance 
was made of these data, based on the 
strict scoring. As in the English 
experiment, the variance due to 
frequency ranges is highly significant 
(3 and 177 df; F = 96.73; P < .OO1). 
The gap test shows that all of the 
differences are significant, including 
the inversion at the last step. With, 
a second analysis of variance using 
the more liberal scoring for Petite, 
frequency remains highly significant, 
but the final inversion is no longer 
significant. 

The thresholds for all French words 
are shown in the lower graph of Fig. 1. 
To make word frequencies in the 
two experiments roughly comparable 
on the graphs, the frequencies of 
the French words, based on a sample 
of 312,000 have been multiplied 
by 14.4 in order to approximate the 
4,500,000 word base of the English 
sample. The least frequent words 
in the French experiment had a 
mean threshold of about 7; words 





INTELLIGIBILITY AND FREQUENCY OF USAGE 


of the two categories of highest 
frequency had mean thresholds of 
about 5. The correlation between 
threshold and log frequency is —.65, 
which is significantly different from 
Q at the .O1 level. With the more 
liberal scoring of “‘Petite,”’ the corre- 
lation between thresholds and log 
frequency becomes —.71; this is 
very close to the correlation we found 
in the English experiment. 


Distribution of Responses 


English experiment.—F ig. 2 presents 
the results obtained in the English 
experiment when the responses were 
categorized as C (correct), M (mean- 
ingful but incorrect), N (nonsense), 
and B (blank). A separate graph 
has been drawn for the responses 
to stimuli of each frequency range. 
The solid curves show the percentage 
of correct responses on the successive 
trials. Without exception, each solid 
curve lies above the preceding one 
at every point. The dashed curves 
show the sum of the correct and the 
meaningful responses, that is, the 
percentages of all English responses. 
Note that these curves are similar 


Word 
Freq 
100 


417 


in height and shape for all four 
frequency ranges of stimuli. In other 
words, Ss showed about the same 
number of English responses for 
all groups of stimuli, but there was 
a greater tendency for these responses 
to be correct when the stimuli’ were 
frequent than when they were in- 
frequent. 

French experiment.—Figure 3 pre- 
sents the results of the French experi- 
ment when the same four response 
categories were used. The curves 
are similar to those of Fig. 2, except 
that there is a larger proportion of 
blanks. The initial masking effect 
was greater in the French experiment, 
and this may also have reduced the 
set to give words in the later lists. 
The dashed curves show the per- 
centage of responses that were French 
words, whether correct or incorrect. 
As in the English experiment, these 
curves are similar for all four fre- 
quency ranges of stimuli. 


Nonsense Responses 


English experiment.—Only 3% of all 
responses in the English experiment 
were nonsense responses. ‘The graphs 





a @ 
° °o 


> 


Percent Responses 

















oO : Atif _is¢ J 
"12-9 -6 -3 O 730-12 -9 -~6 -3 0 +3 Q-12-9 “6-3 O+3 Q-12°9-6 “3 O83 2 














; ee eee Gee al 





Signal -to- Noise Ratio in Decibels 


Fic. 3. 


Percentages of responses in the French experiment falling into each of the four 


response categories, graphed separately for each frequency range of stimuli. 





MARK R. ROSENZWEIG AND LEO POSTMAN 


TABLE 3 


Mean Frequency or RecoGNitions PER 
Susyect ror Worvs RecaLLep 
anp Nor REecALLep 


High- 


| Low 
| Frequency 

| 

| 


Frequency 
| Words 
] 


| 
| 
All 
Words. | Words 


} | 4 
Mean) SD | Mean) SD | Mean| SD 


Recalled 2.00 | 63 | 5. 


| 3. ; 35 | 1. 
Not Recalled y ¥ | 5 | 1.53 | .35 36| .70 
| 


in Fig. 2 indicate that somewhat 
more nonsense responses were given 
to the stimuli in the lower two 
frequency ranges than to those in 
the higher two ranges. To test 
the significance of this difference we 
performed an analysis of variance. 
The distribution of the rather rare 
nonsense responses approximated a 
Poisson distribution, and the Free- 
man-Tukey transformation (VX + 
VX +1 [9]) was used to remove 
heterogeneity of variance. In the case 
of a Poisson distribution, the error 
variance ideally becomes approxima- 
tely 1.00 under this transformation 
(9, p. 326). For our data, the error 
variance of the transformed data is .92. 
The difference indicated by the graphs 
between words of lower and higher 
frequency is highly significant (3 
and 258 df; F = 17.70; P < .OOl). 
Apparently it is somewhat easier to 
base a nonsense response on a dis- 
criminated fragment of a_ low-fre- 
quency word than on a discriminated 
fragment of a high-frequency word. 
French experiment.—In this ex- 
periment the nonsense responses were 
again rather rare, accounting for 
about 6% of all responses. As in the 
English experiment, Fig. 3 indicates 
the nonsense responses to have been 
given somewhat more frequently to 
stimuli of the two lower frequency 
ranges. We performed an analysis 


of variance of these data, using the 
Freeman-Tukey transformation. Here 
again the results of the English 
experiment are closely paralleled. 


Meaningful Responses 


English experiment.—The meaning- 
ful but incorrect responses were 
listed for each of the stimuli in the 
English experiment. The frequency 
of each meaningful response was 
then found in the Lorge Magazine 
Count. In general these responses 
were found to be of rather high 
frequency of occurrence in the lan- 
guage. Thirty-four per cent had 
frequencies equal to or greater than 
1000 per 4,500,000. Such a high 
frequency is shown by only 512 
words of the total of 30,000 in the 
Lorge Count. These results indicate 
that Ss tended strongly to restrict 
their responses to words of 
frequency of usage. 

French experiment.—The meaning- 
ful responses were analyzed as in 
the English experiment. Since the 
French stimuli were bisyllables and 
word frequency decreases with word 
length, it was expected that the 
meaningful responses would be of 
lower frequency of occurrence than 
those of the English experiment. 
It was found that 15°, had frequencies 
equal to or greater than 1000 per 
4,500,000. This high a_ frequency 
was found for only 417 words in the 
French count. Here too, then, Ss 
tended to limit their 
rather common words. 


high 


responses to 


Recall Test 


The results of the test of recall 
given at the end of the threshold 
test are presented in Table 3. The 
stimulus words were divided into 
four classes for each S: (a) Recalled 





INTELLIGIBILITY AND FREQUENCY OF USAGE 


words of high frequency (frequency 
> 100); (b) Non-recalled words of 
high frequency; (c) Recalled words 
of low frequency (frequency < 33); 
and (d) Non-recalled words of low 
frequency. For each S, we calcu- 
lated the mean number of times that 
he had recognized words of each of 
these classes. Table 3 presents the 
mean numbers of recognitions per 
S. It will be seen that within each 
of the two frequency categories, 
recall accompanied greater frequency 
of recognition, and these differences 
were found to be significant in an 
analysis of variance. At the same 
time, it should be noted that the 
high frequency words that were not 
recalled had been recognized, on 
the average, one-third more often 
than the low-frequency words that 
were recalled; i.e., recall increased 
more slowly with exercise for high- 
frequency words than for low-fre- 
quency words. These results suggest 
that frequency of usage played a 
dual role in determining the recall 
scores of the stimuli: (a) On the one 
hand, the more frequent words were 
recognized more often than the less 
frequent words during the threshold 
tests, and this greater exercise favored 
recall. (b) On the other hand, the 
more frequent words were probably 
subjected to more interference than 
the less frequent words during the 
threshold test. Most of the responses 
in the threshold test were words 
of high frequency, and maximal 
retroactive inhibition is favored by 
approximate equality of learning of 
cciginal and interpolated material 
(6, p. 417) and by similarity in their 
association values (13). The 
hypothesized effects of frequency of 
usage presumably canceled each other 
out in this situation. The percentage 
of recall, among Ss who had recog- 
nized a word at least once, was 37.4 


two 


419 


or the and 


high-frequency words 


37.0 for the low-frequency words. 


Discussion 


Role of frequency of usage.—In both 
the English and the French experiments, 
frequency of usage is clearly the most 
important factor determining the dif- 
ferences among thresholds, since it 
accounts for about half the variance. 
Furthermore, this is probably an under 
estimate of the effect of frequency. 
The only measures of frequency that 
we could use are population averages, 
and they do not reflect individual 
differences in frequency of usage. It 
is also pertinent to note that the fre- 
quency measure in the English experi- 
ment had to be based on written usage, 
since no adequate count of oral usage in 
English is available. Nevertheless, the 
correlation between written frequency 
and auditory thresholds is high. There 
are two reasons for expecting written 
frequency to approach auditory tre- 
quency as a predictor of auditory thres- 
holds: (a) Counts of written and oral 
frequencies made in French show fairly 
close correspondences, although there 
are characteristic differences (3). (d) 
In an earlier experiment, we have shown 
a significant relation between the fre 
quency of visual training and auditory 
thresholds (12). It may nevertheless 
be asked why the French experiment, 
which did count of 
frequency, did not show a somewhat 
higher correlation threshold 
and frequency than the English experi- 
ment. We may note two factors which 
tended to reduce the correlation in the 
French experiment relative to that of 
the English experiment: (a4) The sample 
of items in the French experiment was 
only half as large as that in the English 
experiment, and it would therefore be 
expected to be less reliable. (6) The 
range of frequency in the French experi- 
ment was only two and one-half log 
units as compared with three and one- 
half in the English experiment. We 
have already seen that restricting the 


utilize an oral 


between 





420 


range to two and one-half units in the 
English experiment reduced the correla- 
tion to a value below that of the French 
experiment. 

As to the mechanism of the effect of 
frequency, we would offer these con- 
siderations: When the word is heavily 
masked, it is likely that only a part of it 
will be discriminated. The S bases his 
report on this fragmentary perception 
and his knowledge of the language. 
The responses in most cases were either 
correct or meaningful but incorrect; 
i.e., § had no difficulty in giving responses 
that both resembled the stimulus and 
conformed to his language. Further- 
more, the responses tended to be of 
relatively high frequency of occurrence 
in the language, and, for each stimulus, 
a few words tended to account for a 
rather large proportion of the meaningful 
but incorrect responses. Thus it ap- 
pears that the masked stimulus plus $’s 
language habits tend to favor a small 
number of competing responses of rela- 
tively high frequency. If the stimulus 


word is itself of relatively high frequency 
of occurrence in the language, it will 


tend to be prominent among these 
competitors, and thus it will tend to 
be reported on an early trial. If, on 
the other hand, the stimulus is of rela- 
tively low frequency of occurrence, 
it is unlikely to be among the main 
competing responses on the early trials. 
In this case more information—a less 
ambiguous stimulus of the later lists— 
will be needed to secure recognition of 
the rarer stimulus word. 

Effects of word length.—Since intelligi- 
bility has been shown to increase with 
word length (2), we selected for each 
experiment test items that were homo- 
geneous in syllabic length—monosyl- 
lables in the English experiment and 
bisyllables in the French experiment. 
Howes (4), although he does not men- 
tion this point, also used a relatively 
homogenous group of items. The words 
he used for his analysis—the least 
frequent words in the sentences of Mason 
and Garrison (8)—were almost all bi- 
syllables. Using items that were homo- 


MARK R. ROSENZWEIG AND LEO POSTMAN 


geneous in length, both Howes and we 
found word frequency to account for 
about half the variance in thresholds. 
Black (1), on the other hand, used items 
that were heterogeneous in length, and 
he did not control for word length in 
his analysis of the effect of frequency. 
Though concluding that frequency of 
usage was a significant source of variance, 
he did not find it to account for a large 
proportion of the variance. The dif- 
ference between his results and ours 
may thus be due to uncontrolled effects 
of word length in Black’s experiment. 

It is interesting to note that although 
length of words is correlated negatively 
with their auditory thresholds, it is 
correlated positively with their visual 
thresholds (7). This difference may be 
related to a previous observation: “‘Audi- 
tory discrimination of verbal stimuli 
tends to be made in units such as 
syllables. Visual discrimination is more 
analytic. The S can effectively attend 
to units of different sizes’ (12, p. 218). 
Thus a perceived fragment of an auditory 
stimulus tends to be completed in the 
response; the longer the auditory stimu- 
lus word, the greater the chance that 
S will perceive at least part of it and then 
complete the rest. A perceived frag- 
ment of a visual stimulus tends to be 
reported as such; the longer the visual 
stimulus word, the greater the chance 
that part of it will not be perceived 
and that the report will be incomplete. 

A pplications.—Since frequency of us- 
age is the major determinant of thres- 
hold variance, it should be controlled 
in experiments designed to test the 
effects on intelligibility of other factors 
such as word length, phonetic composi- 
tion, and length of the test list. Articu- 
lation tests might also be examined 
from this point of view. A sample 
of three of the PAL PB lists (Numbers 
1, 10, and 20 [2]) indicates that they 
are similar in frequency composition: 
About 8% of the words have frequencies 
of less than 10 on the Lorge Magazine 
Count. The ranges 10-33, 100-330, 
and 1000-3300 all contain about equal 


numbers of items. While the lists 





INTELLIGIBILITY AND FREQUENCY OF USAGE 


seem satisfactorily similar to each other, 
the test might be made more discriminat- 
ing for acoustic factors if the range of 
frequency of the items was reduced. 
This point seems especially pertinent 
when the articulation test is given to 
Ss who have had little or no preliminary 
training with the word lists, as in clinical 
audiometry. When considerable train- 
ing with the test lists has been given, 
frequency of usage may be less important 
in determining the thresholds. 

Lists of highly intelligible words are 
sometimes required for various purposes, 
e.g., alphabetic equivalents for tele- 
phonic, aviation or military use. In 
the preparation of such lists, an initial 
screening might be done in terms of 
relative frequency of usage. 

Recall test.—According to Osgood (11) 
an interference theory of forgetting 
must predict that meaningful items 
will be recalled less well than nonsense 
items. Meaningful material occurs more 


often than nonsense material, and thus 
the meaningful items of a test should suffer 
greater interference from preceding or in- 


terpolated material. Underwood and 
Richardson (17) have recently presented 
data that tend to support this hypothesis, 
using nonsense syllables of high and 
low association values. The stimuli 
that we used can be considered to vary 
in meaningfulness in the sense that the 
words of higher frequency are more 
familiar and have greater “meaning 
value” (10) than do words of lower 
frequency. Since our data suggest that 
susceptibility to interference varies di- 
rectly with frequency of usage, our 
findings are in agreement with Osgood’s 
analysis and with the results of Under- 
wood and Richardson. 


SUMMARY 


Parallel experiments were performed in 
English and French in order to determine 
whether the intelligibility of words masked by 
noise varies with the frequency of usage of the 
words. In each experiment, the items were 
homogeneous in syllabic length. Correlations 
between intelligibility scores and log frequency 
were —.78 and —.65 in the respective experi- 


421 


ments, indicating that frequency of usage 
accounts for about half the variance of the 
thresholds.* Nonsense responses were rare, but 
in both experiments they were given significantly 
more often to stimuli of low frequency than 
to stimuli of high frequency. Most of the 
incorrect responses were meaningful words, 
and these tended to be of high frequency of 
usage. In the English experiment, Ss attempted 
to recall the stimuli at the end of the threshold 
test. The high-frequency words, though they 
had been recognized more often during the 
threshold test, were not recalled better than 
the low-frequency words. This effect is at- 
tributed to interference among the many high- 
frequency responses made during the threshold 
test. 


REFERENCES 


. Brack, J. W. Accompaniments of word 
intelligibility. J. Speech Hearing Dis- 
orders, 1942, 17, 409-418. 

. Eoan, J. P. Articulation testing methods 
Laryngoscope, 1948, 58, 955-991. 

. Goucenueim, G., Rivenc, P., Micnéa, R., 
& Savuvaceot, A. L’Elaboration du 
Francais Elémentaire. Paris: Didier, 
1956. 

. Howes, D. 
messages. 
400-465. 

Howes, D. H., & Sotomon, R. L. Visual 
duration threshold as a function of 
word-probability. J. exp. Psychol., 1951, 
41, 401-410. 

. McGeocn, J. A., & Inton, A. L. The 
psychology of human learning. New 
York: Longmans, Green, 1952. 

. McGinnies, E., Comer, P. B., &@ Lacey, 
QO. L. Visual-recognition thresholds as 
a function of word length and word 
frequency J. exp. Psychol., 1952, 44, 
65-09. 

. Mason, H. M., & Gareisox, B. K.  Intel- 
ligibility of spoken messages; liked and 
disliked. J. abnorm. soc. Psychol., 951, 
46, 100-103. 

. Mosrerrer, F., & Busu, -R. R. 
quantitative techniques 


The intelligibility of spoken 
Amer. J. Psychol., 1952, 65, 


Selected 
In G. Lindzey 


* Word frequency has been found to account 
for an average of 69% of intellipibility variance 
for words of 3 to 11 letters in length, according 
to an article which appeared after the present 
paper had been accepted. See D. Howes, “On 
the Relation Between the Intelligibility and 
Frequency of Occurrence of English Words.” 
J. acous. Soc, Amer., 1957, 29, 296-305. 





MARK R. ROSENZWEIG AND LEO POSTMAN 


(Ed.), Handbook of social psychology. 
Vol. 1. Theory and method. Cambridge, 
Mass.: Addison-Wesley, 1954. 

10. Nose, C. E. An analysis of meaning. 
Psychol. Rev., 1952, $9, 421-430. 

11. Oscoop, C. FE. Method and theory in 
experimental psychology. New York: Ox- 
ford Univer. Press, 1953. 

12. Postman, L., & Rosenzweic, M.R.  Prac- 
tice and transfer in the visual and audi- 
tory recognition of verbal stimuli. 
Amer. J. Psychol., 1956, 69, 209-226. 

13. Sisson, E. D. Retroactive inhibition: the 
influence of degree of association value 
of original and interpolated lists. /. 


exp. Psychol., 1938, 22, 573-58u. 


14. Snepecor, G. W. Statistical methods ap- 
plied to experiments in agriculture and 
biology. (4th ed.) Ames, Ia.: Collegiate 
Press, 1946. 

15. Tuornoike, FE. L. A teacher's word book 
of 20,000 words. New York: Teachers 
Coll., Columbia Univer., 1931. 

16. Tuornpixe, E. L.,& Lorce, |. The teacher's 
word book of 30,000 words. New York: 
Teachers Coll., Columbia Univer., 1944. 

17. Unperwoon, B. J., & Ricnarpson, J. The 
influence of meaningfulness, intralist 


similarity, and serial position on reten- 


J. exp. Psychol., 1956, $2, 119-126. 


tion. 


(Received November 16, 1956) 





Journal of Experimental Psychology 
Vol. 54, No. 6, 1957 


SELECTIVE SAMPLING IN 
DISCRIMINATION LEARNING! 


DAVID L.-LA BERGE AND ADRIENNE SMITH 


Indiana University 


It is an explicit assumption of the 
Estes-Burke statistical learning 
theory (5) that in a two-choice learn- 
ing situation the probability of a 
response on a given trial is equal 
to the proportion of sampled stimulus 
elements connected to that response. 
The present experiment attempts to 
investigate a possible relationship 
between this probability assumption 
and the selective sampling of cues 
in a discrimination-learning situation. 

In the simple learning situation an 
O, in response to a signal, S, makes 
a prediction, A; or A», that one of 
two events, E,; or Es, will occur on 
that trial. This situation may be 
extended to cover discrimination 


learning merely by introducing two 
different signals, S,; and S», and by 


letting the reinforcing events be 
contingent upon the signals. In an 
article on discrimination learning (6), 
Estes and Burke represenc each 
signal by a set of stimulus elements. 
Some of the elements are exclusive 
to each signal, and the rest are 
common to both signals. During 
the course of learning a discrimina- 
tion task in which the signals are 
perfectly correlated with the rein- 
forcing events, the set of elements 
exclusive to each signal becomes 
progressively connected to separate 
responses. ‘The set of elements com- 
mon to the signals, however, becomes 
connected to both responses in pro- 
portion to the probabilities of the 
reinforcing events. For example, if 


'The authors are indebted to C. J. Burke 
who read the manuscript and made many 
helpful suggestions. 


FE, and FE, occur with probabilities 
of .75 and .25 respectively, the pro- 
portion of common elements con- 
nected to A; approaches .75. ‘Thus 
common elements sampled during a 
discrimination task become connected 
to the responses according to a 
partial reinforcement schedule. 

It is convenient to designate two 
sources of common elements in a 
discrimination task. One set of com- 
mon elements may be associated 
with the stimulus figures of S,; and 
S:, and will be referred to as the 
figure common elements. The other 
set of common elements may be 
associated with background stimuli, 
such as the characteristics of the room, 
sounds from the apparatus, postural 
stimuli, etc. ‘This.set will be referred 
to as the background common elements. 

It is well known that organisms 
can discriminate two signals at a 
very high criterion, if the signals 
are sufficiently dissimilar and an 
appropriate number of trials is al- 
lowed. If, given a particular signal, 
the probability of a response could 
become virtually unity, then it follows 
from the probability assumption that 
O must be sampling only those ele- 
ments connected to that response. 
Common elements of either kind 
should not be included in this sample 
for some of them would be connected 
to the other response which would 
produce errors. ‘Thus, an O respond- 
ing at or near unity in a discrimina- 
tion task should not be sampling 
common elements. 

Therefore, to provide firm evidence 
for the selective sampling of common 


423 





424 


elements in this situation, it is 
necessary to demonstrate two things: 
(a) It must be shown that Ss are 
not sampling a specific set of com- 
mon elements in the signal, which 
should be accomplished by testing 
Ss when they are at a near unity 
level of performance; and in addition 
(b) it must be shown that this set 
of elements is available for sampling 
throughout the discrimination task. 

In the present experiment, the 
status of the set of background 
common elements is tested by a 
procedure similar to that used by 
Schoeffler (14), Binder (2), and Peter- 
son (12). After a series of training 
trials, they presented a test stimulus 
which contained only a subset of the 
cues present on training trials, and 
they noted the responses given to 
this ambiguous ‘stimulus. In the 
present experiment, the status of 
background common elements is test- 
ed during the entire course of learning 
by periodically presenting a blank 


signal, i.e., a stimulus containing all 
the features of the two main signals 
except the differentiating figures. The 
proportion of A, responses to the 
blank signal is taken as an estimate 


of the proportion of background 
common elements connected to A; 

This method assumes that the 
status of these common elements is 
to be affected almost entirely by 
training trials and little if any by 
test trials. One way to reduce effects 
of test trials is to permit neither 
reinforcing light to follow the blank 
signal (1). It is not necessary in 
this experiment, however, to assume 
that nonreinforcement leaves com- 
pletely unchanged the status of the 
sampled elements. It is desirable 
merely to minimize possible effects 
of test trials in 
trials. 
does 


favor of training 
And in case nonreinforcement 


have some influence upon 


DAVID L. LA BERGE AND ADRIENNE SMITH 


sampled elements, this is minimized 
by presenting blank signals on only 
a low percentage of trials. 

The test of the status of background 
common elements used here is as 
follows. <A successive discrimination 
is learned in which the signals are 
perfectly correlated with the rein- 
forcements. ‘The signal and rein- 
forcing event associated with A, 
occur with a probability of .75, 
ignoring trials when neither event 
occurs. After Ss have met a very 
high criterion of learning, the prob- 
ability of occurrence of the signal 
is changed to .25 for a series of 
trials. If the background common 
elements are being sampled on early 
trials, the mean proportion of A, 
responses to the blank signal is 
expected to approach .75. However, 
once the Ss are responding at asymp- 
tote, they should not be sampling 
common elements thereafter, and 
although the partial reinforcement 
schedule is then changed, the mean 
proportion of A, responses to the 
blank signal should remain unchanged 
at or near .75. 


MetTuop 


Apparatus.—A darkened experimental room 
contained five booths facing a black screen. 
The E sat in the middle booth and inserted 
slides into a projector, while the four Ss sat in 
the other booths. The screen was approxi- 
mately 10 ft. from the eyes of Ss. Each booth 
contained a l4-in. lever and two reinforcing 
lights arranged on the right and left above the 
lever. The lever could be moved to the right 
or left, and contained a spring which returned 
it to center when released. 

A punched-tape programming device con- 
trolled the sequence of stimuli by flashing one 
of the lights in E’s booth to indicate which slide 
to insert in the projector. The programmer 
also controlled the projector latap and the 
reinforcing lights in the booths. 

Stimuli.—The two pairs of stimuli used in 
this experiment are shown in Fig. 1. They 
were constructed by fixing two end points and 
moving the vertex point in a vertical direction. 





SELECTIVE SAMPLING IN DISCRIMINATION LEARNING 


The three points were then joined by two lines. 
Because the movement of the point is con- 
tinuous, the stimuli could be made arbitrarily 
similar (9). 

Thus the pair in which the vertex points of 
S, and -S, practically coincide in position was 
called the hard discrimination, and the pair in 
which the vertex points of S; and Se are more 
disparate was called the easy discrimination. 
The figures were drawn on paper, photographed, 
and the negatives mounted on 2 X 2-in. slides. 
When projected on the black screen the figures 
appeared as white lines. The blank slide was 
completely opaque and projected nothing on 
the screen. A buzzer accompanied the presenta- 
tion of each slide, the blank slide included. 

Design.—The experiment was run in two 
parts. The first part consisted of a hard dis- 
crimination task and the second part consisted 
of an easy discrimination task. Except for the 
larger number of Ss used in the easy discrimina- 
tion, both parts were treated alike. Within 
each part were three conditions which are 
summarized in Table 1. Under the Discrimina- 
tion Change condition (DC), FE, always followed 
S: and E, always followed Sy. For the first 
part of the series of trials, S,; and S: appeared 
with probabilities .682 and .227, respectively, 
and for the second series of trials their respective 
probabilities were changed to .227 and .682. 
The Discrimination Non-Change condition 
(DNC) was the same as the DC condition except 
that S; and S, appeared with probabilities of 
682 and .227 throughout the entire series of 
trials. For the Non-Discrimination Change 
condition (NDC), E, followed S, with prob- 
ability .50 and E» followed S_ with probability 
50. For the first part of the series of trials, 
S: and S2 appeared with probabilities of .682 
and .227, respectively, and for the second series 
of trials their respective probabilities were 
changed to .227 and .682. The blank signal 
was never followed by a reinforcing light. 
Random sequences were made up in blocks 
of 22 trials such that S; appeared in 15, S, in 5, 


HARD EASY 


DISCRIMINATION DISCRIMINATION 


oN 
“he 





Fic. 1. Stimuli used in the 
discrimination tasks 











and the blank signal in 2 trials. The other 
two restrictions were that there would be one 
blank signal in each half block of 11 trials and 
that there would be no two successive blank 
signals. 

The two main signals were each called §, 
and S, half the time under each condition. 
There were two sets of sequences for each con- 
dition and each set was matched as much as 
possible across conditions. Conditions DC and 
DNC had identical sequences of signals and 
reinforcements for Blocks 1-7. Conditions DC 
and NDC had identical sequences of reinforcing 
events for all trials. 

For the hard discrimination, 16 Ss were 
assigned to each condition, and for the easy 
discrimination, 24 Ss were assigned to each 
condition. Pilot studies had shown that it 
was very difficult to obtain enough Ss who 
could meet a criterion of 40 errorless trials, so 
a less stringent criterion was adopted. Enough 
Ss were run to assure that each of the assigned 
number of Ss for the DC and DNC conditions 
met a criterion of one error or less in blocks 6 and 
7. Within each booth one of the two reinforcing 
lights was designated F,, the other E,. The 
side of the F, light was counterbalanced with 
in each subgroup of four subjects 

Subjects 207 students from 
introductory courses in psychology and soci- 
ology. Because it was necessary 


The Ss were 


to meet a 


TABLE 1 


ExperRimMentAL Desicn in Teams or Propapitities or SIGNALS 
AND oF SIGNAL-REINFORCEMENT COMBINATIONS 


Blocks 1-7 


Condition 


DC 
DNC 
NDC 


Blocks 8-13 


Blank 
Signal 


O91 
On 
O91 





DAVID L. LA BERGE AND ADRIENNE SMITH 


o——* HARD DISCRIMINATION 


o---= EASY DISCRIMINATION 


a | 
6 7 e¢ 
TRIAL BLOCKS 








Fic.2. Curves for both hard and easy 
discriminations, in terms of mean proportion 
of Ay responses in the presence of S,; or Sp. 
For the hard discrimination curves, each point 
is based on responses of 16 Ss; for the easy 
discrimination curves, each point is based on 
the responses of 24 Ss. For both discriména- 
tions, each point of the S; curves in Blocks 
1-7 and the S_ curves of Blocks 8-13 is based 
upon 15 responses per S; and each point of the 
S. curves of Blocks 1-7 and the S; curves of 
Blocks 8-13 is based upon 5 responses per S. 


very high learning criterion in this experiment, 
many more Ss were run than could be used in 


the analysis. For the hard discrimination, 119 
Ss were run. Of these, 48 met criterion, 67 
failed to meet criterion, and 4 were discarded 
for procedural errors, For the easy discrimina 
tion, 88 Ss were run. Of these, 72 met criterion, 
12 failed to meet criterion, and 4 were discarded 
for procedural errors. 

The Ss were run in groups of four and were 
assigned randomly to conditions and sequences 
within each part of the experiment. Parts I 
and II of the experiment were run at approxi 
mately the same time on the same population of 
Ss. 

Procedure.—After the subjects were seated, 
the overhead light was turned off and the fol 
lowing instructions were read. 

“Be sure you are seated comfortably. Your 
task in this experiment will be to predict which 
light in your compartment will go on. You 
are to make the prediction by pushing the lever 
to the right or to the left. Each trial will 
begin with the sound of a buzzer, and most of 
the time a figure will appear on the screen in 
the front of the room. About a second later 
either the left or the right lamp in your com 
partment will light for a moment. As soon 
as you hear the buzzer, you are to guess whether 


the right or the left lamp will light on that 
trial and indicate your choice by pushing the 
lever in the proper direction. If you expect 
the left lamp to light, push the lever to the left; 
if you expect the right lamp to light, push the 
lever to the right; if you are not sure, guess. 
You should watch the figure on the screen as 
it will help you to make correct predictions. 

“Be sure to keep your hand on the lever 
at all times, make your choice while the buzzer 
is on, press the lever in the proper direction, 
hold it there for a moment, and then move it 
back to center before the buzzer goes off. 

“Now you will be given some practice trials. 
Are there any questions?” 

The practice trials consisted of a circle as the 
signal, which was given four times, followed 
by reinforcing lights in the order F,, Fy, Ez, 
F», and then the blank signal followed by neither 
light. Then two additional presentations of 
the circle were followed by FE, and Fy. After 
the practice trials the instructions were con- 
tinued. 

“Did you push the lever on each trial while 
the buzzer was on on the trial when 
nothing appeared on the screen? Occasionally 
there will be trials on which the buzzer will not 
be followed by either lamp, and you will not 
know if your prediction was correct or not. In 
any case, you must always make a prediction 
when the buzzer goes on. 

“Are you sure you understand all the in- 
structions so far? The rest of the trials will 
have to be run off without any conversation 
or other interruptions. 


even 


Please make a predic- 
if it 
Make a guess on the first trial, then try to 
improve your predictions as you go along and 


tion on every trial even seems difficult. 


make as 
Ready?” 
After the instructions were read, the sequence 
of 284 trials was run off without interruption. 
The time relationships were as follows. Dura- 
tion of the between the 
signal and the reinforcing light: 1 sec.; duration 
of the reinforcing light: 1 sec.; time between 
reinforcing light and next signal: 2 sec 


many correct choices as you can. 


signal: 3 sec.; time 


RESULTS 


The discrimination curves for the 
DC groups of the hard and easy 
conditions are shown in Fig. 2. 
The performances of both groups in 
Criterion Blocks 6 and 7 appear to 
approach but do not completely 
achieve 1.00, as was expected from 
pilot studies. An unexpected result 





SELECTIVE SAMPLING IN DISCRIMINATION LEARNING 


appears in the marked decrement in 
performance of the hard discrimina- 
tion group when the signal probabili- 
ties changed, and there is a slight 
suggestion of a similar effect occurring 
in the easy discrimination group. 
However, performance in the critical 
blocks, 8-13, is clearly higher in the 
case of the easy discrimination group. 

The blank signal data from the 
test trials of the hard discrimination 
groups are shown in Fig. 3 and 4. 
For the criterion groups in Fig. 3, 
each point is based upon two re- 
sponses of 16 Ss, or a total of 32 re- 
sponses. For the noncriterion groups 
in Fig. 4, the points of the DC and 
DNC groups are based upon 60 and 
74 responses, respectively. 

For the hard discrimination groups 
in Fig. 3, the DC and NDC curves 
appear to drop after the shift in 
signal probabilities at Block 8, com- 
pared to the DNC control group. 
A test of the differences between the 
three conditions shown in Fig. 3 
was set up in the following way. For 
each S, the ten responses from 
Blocks 3-7 were combined and com- 
pared with the ten responses from 


10 
9 
8 


7 


PROPORTION OF A RESPONSES 








2-TRIAL BLOCKS 


Fic. 3. Curves for the hard discrimination 
criterion groups in terms of mean proportion 
of A; responses in the presence of the blank 
signal. Each point is based on 16 Ss, 2 re- 
sponses per S. 








2-TRIAL BLOCKS 


Fic. 4. Curves for the hard discrimination 
noncriterion groups in terms of mean proportion 
of A, responses in the presence of the blank 
signal. Each point of the DNC and the DC 
curves is based upon 37 and 30 Ss respectively, 
2 responses per S. 
Blocks 9-13. These sets of blocks 
were chosen because they seemed 
to represent more stable rates of 
responding, after the expected change 
in rate at the beginning of learning 
and after the change in the partial 
reinforcement schedule. A distribu- 
tion of differences was compiled for 
each condition and the means of these 
differences compared. ‘The variances 
were not homogeneous and therefore 
the Cochran-Cox approximation to 
the ¢t test (4) and the Lewis exact 
procedure (11) were used, 

The results of both t tests showed 
the mean change of the DNC groups 
to be significantly different (P < .05) 
from that of the DC or the NDC 
groups. However, the mean change 
of the DC group was not significantly 
different from that of the NDC 
group by either test. 

The blank signal data from the 
easy discrimination group are shown 
in Fig. 5. Each point is based on 
two responses of each of 24 Ss, 
totaling 48 responses. The curves 
for the noncriterion easy discrimina- 
tion groups were not included because 





DAVID L. LA BERGE AND ADRIENNE SMITH 


a > © eo ~ = 


PROPORTION OF A RESPONSES 








>). * @ 4 2a 82 
2-TRIAL BLOCKS 
Fic. 5. Curves for the Easy discrimination 
criterion groups in terms of mean proportion 
of A, responses in the presence of the blank 
signal. Each point is based upon 24 Ss, 2 
responses per S. 


of their unreliability due to the small 
number of cases involved. 

The important findings for the 
easy discrimination groups in Fig. 5 
are that after the shift in signal 
probabilities, the DC and the DNC 


curves do not change but the NDC 


curve does change. The test of the 
differences between the three con- 
ditions was set up in the same manner 
as that of the hard discrimination 
groups. The mean change of the 
DC groups was not significantly 
different from that of the DNC 
group by both ¢ tests, but the mean 
change of the NDC group was 
significantly different from that of 
the DC or the DNC groups by both 
t tests. 

From an inspection of the levels 
attained by the blank trial data in 
Fig. 3, it appears that all the hard 
discrimination groups except possibly 
the NDC group, fall below .75 as an 
asymptote for the early prechange 
trials. And on _ postchange trials, 
neither the DC nor the NDC groups 
appear to have reached .25 by the 
last block, although the DNC group 
seems to be moving nearer to .75 


by the last block. Similarly for 
the easy discrimination groups in 
Fig. 5 it appears that the NDC group 
falls below the .75 and .25 asymptotes 
under the pre- and postchange condi- 
tions, respectively. However, it may 
well be that an insufficient number 
of trials under constant conditions 
were allotted to permit the curves 
to reach asymptote, and therefore 
it would seem inappropriate to make 
asymptote tests on these data. 

It is meaningful to compare the 
blank signal curves of the hard and 
easy discriminations on early trials. 
Considering the first seven blocks 
alone, the levels of responding for the 
DC and DNC groups taken together 
appear consistently higher in the case 
of the hard discrimination. At test of 
the combined over-all difference is 
significant at the .O1 level, after using 
an arcsin transformation of the raw 
scores to homogenize the variances 


(7). 
Discussion 


From the discrimination curves of the 
hard group it is apparent that although 
they were near perfect performance on 
Blocks 6 and 7 they were well below 
criterion during Blocks 8-13. Since 
they were not at asymptote during 
this final period they could easily have 
been sampling common elements as 
indicated in the drop of the DC curve 
in Fig. 3. In fact, even if Ss were 
responding at .99, the few common 
elements responsible for the last .01 
could presumably be sampled on blank 
trials and account for the drop in the 
curve. There is nothing in the data 
which indicates that the size of the 
set of common elements could not be 
small relative to the total number of 
elements. Although the hard DC group 
failed to maintain a high level of per- 
formance during the critical blocks 8-13 
and appeared to be sampling common 
elements, it may still provide data 
relevant to the main hypothesis. When 





SELECTIVE SAMPLING IN DISCRIMINATION LEARNING 


the DC curves of the criterion and non- 
criterion groups are compared, the slope 
of the noncriterion group appears steeper 
than that of the criterion group. If 
the noncriterion group were sampling 
more common elements than the cri- 
terion group, then they would be ex- 
pected to show a higher rate of change 
to the blank signal when the partial 
reinforcement schedule changed. This 
finding, while not conclusive, is consistent 
with the main hypothesis. 

The discrimination curve for the easy 
group reaches a high level of performance 
early and appears to remain consistently 
near asymptote during Blocks 8-13. 
The finding that the blank trial responses 
of the DC group did not change, as 
shown in Fig. 5, suggests that Ss were 
not sampling a detectable amount of 
background common elements during 
that period. In fact, the response levels 
of both the DC and DNC groups remain 
near .50 for the entire series of trials, 
suggesting that they were possibly 
ignoring these common elements from 
the start. That common elements were 
present to be sampled is implied by the 
performance of the NDC group, which 
was significantly affected by the change 
in partial reinforcement schedule. Thus, 
the results from the easy groups suggest 
that Ss were not sampling background 
common elements to a detectable extent 
when responding sufficiently near asymp- 
tote, although such elements were avail- 
able in the signal. Taken together, 
the data from the easy DC and NDC 
groups seem to support the selective 
sampling hypothesis. 

There is additional evidence for the 
hypothesis when the hard and easy 
discrimination groups are compared with 
respect to their blank trial curves for 
Blocks 1-7. The stimulating character- 
istics of background common elements 
were alike for all groups of the experi- 
ment, yet the hard and easy groups 
showed distinctly different responses to 
them almost from the beginning. The 
apparent explanation for this difference 
is that the hard groups were sampling 
common elements and the easy groups 
were not. 


429 


There are at least two hypotheses 
which may account for such selective 
sampling. One hypothesis, proposed by 
Restle (13), is that common elements 
become “‘adapted”’ during training and 
are no longer effective in determining 
the responses. A second hypothesis, 
developed by Wyckoff (15) is that com- 
mon elements disappear from the sample 
because the “observing response’ be 
comes directed chiefly toward differential 
elements. The results of ,this experi 
ment, however, do not clearly indicate 
whether or not the process of discrimina 
tion learning involves a_ progressive 
reduction of common elements in the 
sample. The elements do appear to 
have dropped out for the DC and DNC 
groups of the easy discrimination, but this 
seems to have occurred almost at the first 
trial, and not during subsequent trials. 

An important subsidiary finding is 
indicated by the marked decrement in 
discrimination performance by the hard 
DC group when the signal probabilities 
were changed. There appear to be 
several possible sources of this sudden 
appearance of errors. Since the change 
in signal probabilities carried with it 
no change in the characteristics of the 
individual figures themselves, the errors 
may be due to differential elements which 
had not yet been sampled, or due to 
the appearance of new background 
common elements. Sets of unsampled 
elements, differential or common, are 
assumed to be connected to both re 
sponses, and therefore when first sampled 
increase the probability of errors. The 
new common elements might conceiv 
ably be associated with visual and 
kinesthetic changes occurring from the 
shift of the more frequent light and the 
more frequent response to the other 
side of S’s booth. Another possible 
source of errors could be some change in 
the stimuli which control the observing 
response, producing an increase in the 
number of common elements in the 
sample. Finally, at least part of the 
decrement in performance may be due 
to an artifact in the selection of the 
criterion groups. Some of the Ss may 
have met the criterion in Blocks 6 and 7 





430 DAVID L. LA BERGE AND 


by chance deviation from a lower level 
of performance. Thus the group re- 
sponse levels shown in Blocks 6 and 7 
are probably an overestimation of their 
actual level of performance, and there- 
fore some decrement would be expected 
in Block 8. Such hypotheses are only 
suggestive, and further research is needed 
to determine the reason for the obtained 
decrement in performance when signal 
probabilities are changed. 


SUMMARY 


This experiment was an attempt to investi- 
gate a certain implication of the probability 
assumption of statistical learning theory when 
applied to discrimination learning. ‘The hypoth- 
esis derived from the assumption was that 
Ss responding at or very near asymptote would 
be selectively ignoring background common 
elements. 

The 207 Ss were given 284 trials on successive 
discrimination tasks of two levels of difficulty, 
both involving simple figures. Interspersed 
among the trials were test signals, involving 
background stimuli only. ‘The mean proportion 
of A; responses to these blank signals was taken 
as a measure of the status of these common 
elements. When the partial reinforcement 
schedule associated with the background stimuli 
was abruptly changed, those Ss which were 
responding near asymptote did not change 
their level of response to the blank signals. 
Those Ss which were not as near asymptote 
changed their level of response to the blank 
signals. These results were considered favor- 
able to the selective sampling hypothesis. 

An additional finding was that a change 
in the probabilities of signals in a discrimination 
task produced a decrement in performance. 
The data gave no conclusive indication of the 
source of this sudden appearance of errors. 


REFERENCES 
1. Arxinson, R. C. An analysis of the effect 
of nonreinforced trials in terms of sta- 


tistical learning theory. J. exp. Psychol, 
1956, 52, 28-32. 


ADRIENNE SMITH 


. Brnper, A. A statistical model for the 


process of visual recognition. Psychol. 
Rev., 1955, 62, 119-129. 


. Busu, R. R., & Mosrerrer, F. Stochastic 


models for learning. New York, Wiley, 
1955. 


. Cocnran, W. G., & Cox, M. Experimental 


designs. New York, Wiley, 1950. 


. Estes, W. K., & Burke, C. J. A theory of 


stimulus variability in learning. Psychol. 
Rev., 1953, 60, 276-286. 


. Estes, W. K., & Burke, C. J. Application 


of a statistical model to simple discrimina- 
tion learning in human subjects. /. 
exp. Psychol., 1955, 50, 81-88. 


. Freeman, M. F., & Tuxey, J. W. Trans- 


formations related to the angular and 
the square root. Ann. math. Statist., 
1950, 21 607-611. 


. Gooowin, W. R., & Lawrence, D.H. The 


functional independence of two dis- 
crimination habits associated with a 
constant stimulus situation, J. comp. 
physiol. Psychol., 1955, 48, 437-443. 


. LaBerce, D. L., & Lawrence, D. H. 


Two methods for generating matrices of 
forms of graded similarity. J. Psychol., 
1957, 43, 77-100. 


. Lawrence, D. H., & Mason, W. A. Sys- 


tematic behavior during discrimination 
reversal and change of dimensions. /. 


comp. physiol. Psychol., 1955, 48, 267- 
”7 


. Lewis, D. Quantitative methods in psy- 


chology. Ann Arbor, Edwards Bros., 
1949, 


. Peterson, L. R. Prediction of a response 


in verbal habit hierarchies. J. exp. 


Psychol., 1946, $1, 249-252. 


. Restie, F. A_ theory of discrimination 


learning. Psychol. Rev., 1955, 62, 11-19. 


. Scuoerrter, M. S. Probability of re- 


sponses to compounds of discriminated 
stimuli. J. exp. Psychol., 1954, 48, 323- 
329. 


. Wyexorr, L. B. The role of observing 


responses in discrimination learning: 


Part 1. Psychol. Rev., 1952, 59, 431-442. 


(Received November 8, 1956) 





Journal of Experimental Psychology | 
Vol. 54, No. 6, 1957 


THE SEMANTIC DIFFERENTIAL AND MEDIATED 
GENERALIZATION AS MEASURES OF MEANING! 


LEONARD LIPTON? AND RICHARD L, BLANTON 


University of Kentucky 


While concept formation studies 
have led to some rules governing the 
formation of concepts (14), the role 
played by concepts or symbols in 
determining subsequent responses is 
still unclear and the issues are re- 
garded as complex. Osgood (8) and 
Noble (7) have shown that measur- 
able aspects of meaning may be 
treated as independent variables. Os- 
good has formulated a “mediation” 
theory to account for “semantic” 
or mediated generalization and has 
developed a set of scales called “The 
Semantic Differential’’ to measure 
certain aspects of connotative mean- 
ing (9). In most studies of mediated 
generalization (1, 2, 3, 4, 11) equiva- 
lence of past experience has been 
assumed for Ss. Yet such generaliza- 
tion effects may be so dependent 
on fortuitous elements of life history 
that between-Ss variability is quite 
large (4, p. 194). 

The present study is an attempt 
to bridge the gap between concept 
formation and mediated generaliza- 
tion studies. It is proposed first 
to study the relationship between 
meaning as a dependent variable, 
based on experimentally induced se- 
mantic processes established by pre- 
training on a concept formation 
task, and mediated generalization 


1 Part of a dissertation presented by the first 
author to the Faculty of the Graduate School 
at the University of Kentucky in candidacy 
for the degree of Doctor of Philosophy. Thanks 
are expressed to members of the committee for 
advice and assistance, and to Mr. John Donahoe 
for aid in the analysis of the data. 

2 Presently on the staff of the Veterans 
Administration Hospital, Lexington, Kentucky. 


431 


measured by .a_ differential GSR 
conditioning procedure. Second, the 
Semantic Differential measures estab- 
lished by pretraining will be compared 
to the GSR gradients obtained by 
conditioning. Third, it is hoped that 
leads will be obtained for further 
exploration of two complex issue 

(a) the effect of repetition of the 
sign-significate pairing on the level 
and stability of acquired meaning, 
and (b) the relationship of the con- 
ceptual ability of S to these meaning 
variables. While moderate amounts 
of repetition or overlearning have 
little effect on the retention of 
acquired concepts (12) little is known 
about the effect of repetition on the 
mediation process. The relationship 
of conceptual ability to the level 
and stability of acquired meaning is 
also a matter of general interest (11). 


Metuop 


Design.—Table 1 represents the essential 
design used in the experiment. The Ss were 
assigned to one of two levels of ability (A), 
one of three levels of learning (L) and one of 
two conditioning groups (C). Within-Ss vari- 
ance was obtained to each of eight generalization 
stimuli (G) representing four points on each 
of two dimensions of meaning. 

Pretraining materials.—The Semantic Dif- 
ferential Scales used in the study were selected 
from those listed by Osgood as having high 
saturations on the three factors of evaluation, 
potency, and activity (10). Three seven-point 
scales for each factor were printed on each sheet 
of paper used in the scaling. 

The stimuli used in the concept formation 
task were abstract designs of the “common 
elements” type (see Fig. 1). Sixty-four cards 
were constructed, all differing except for the 
presence of one of eight cues. The relevant 
cues on 32 cards defined a dimension of circu- 





LEONARD LIPTON AND RICHARD L, BLANTON 


TABLE 1 
EXPERIMENTAL Dgsicn 
(4 Ss in each subgroup) 


Cy (Circles) Ca (Angles) 


Ai (High) | Aa (Low) 


Gen, 
Stimuli) Ai aie) A; (Low) 


| | 
Ls} Lal Les Bea] Lal Le | al L.| Ls | 


TTT TT] 


| 


larity, 8 containing a full circle, 8 a broken 
circle, 8 a semicircle and 8 an arc. The other 
half of the deck represented a dimension of 
angularity, the cues being a full triangle, an 
incomplete triangle, a truncated triangle and a 
simple angle. By preliminary study, the 
distracting elements were arranged so as to 
make the concepts approximately equally 
difficult to learn. 

Subjects.--The Ss were 24 male and 24 female 
students recruited from laboratory classes in 
introductory psychology at the University of 
Kentucky. The Hanfmann-Kasanin Test of 
abstracting ability was used to assign Ss to the 
two levels of ability used in the experiment. 
The mean of the error scores was 5.94, SD was 
2.29, and the groups differ significantly by t 
test (P < 001). The mean of the high ability 
group was 4.04, of the low group 7.87. The Ss 
within the three learning groups were matched 
for ability. When the conditioning groups 
were established, means on the Hanfmann- 
Kasanin were 5.54 for the group conditioned to 
circularity and 6.33 for the group conditioned 
to angularity. The difference between these 
means is not significant (P > .30). The F 
for variances of the two conditioning groups was 
1.19, 

Pretraining procedure.—Kight nonsense syl- 
lables (JID, YEM, CEF, ZIL, SIJ, MEQ, 
DAX, QOB) were selected from Glaze’s list of 
syllables having zero association value, and 
each S rated these on the Semantic Differential. 
The scaling procedure was demonstrated to 
each S with an example and he was instructed 
as follows: 

“Now I want you to rate each of these non- 
sense words. Put a check mark of any kind 
in the space that best characterizes the meaning 
or feeling that the word suggests to you. Rate 
each word on all the scales. You'll find a 
separate page for every word, and the word 





Gs 








is typed at the top of the page. Mark them 
quickly; your first impressions are the best.” 

The Ss did not find the task difficult and all 
appeared to take it seriously. 

All Ss were then required to abstract the 
correct elements from the concept formation 
cards and associate the eight nonsense syllables 
to them. The cards were presented manually 
and in a fixed random order which was used for 
all Ss, with the restriction that no immediate 
repetition of the same syllable occurred. Each 
S was allowed to observe the 64 stimuli before 
recording of errors was begun. A random 
assignment of syllables to the eight cues was 
made for each S. One-third of the Ss learned 
the task to two, one-third to four, and one-third 
to six perfect repetitions. 

Immediately after learning, S re-rated the 
syllables on the Semantic Differential with 
instructions to mark the scales on the basis of 
their initial impressions. 

Conditioning apparatus.—The apparatus used 
for measuring the GSR was one designed by 
Nichols and Daroge (6). In addition to meter 
readings which are proportional to the percentage 
of change in skin resistance, a shock of .Ol-sec. 
duration is provided which can be used as the 
UCS. 

Conditioning and extinction procedure.—The 
S was seated facing a blank wall with the ap- 
paratus and £ behind him. The S was told 
that a test of memory was to be made in which 
a word list would be presented orally one word 























SSS 
PQ 
= 


ee 











AY_ 

















lla 





Fic. 1. Examples of card stimuli 
for the circularity dimension. 





MEASURES OF MEANING 


at a time, and that he was to respond to each 
word as quickly as he could with the first word 
that came to his mind. He was also told that 
physiological measurements would be made 
through the electrodes, and that he would 
receive an electric shock from time to time. He 
was then given a series of five meaningful words 
on which to practice association. 

For half the Ss, the nonsense syllable which 
each had associated to the complete circle was 
used as the CS. For the other half, the syllable 
associated with the complete triangle was used. 
Following a modification of the design described 
by Lacey and Smith (5), a 40-syllable list was 
devised. The two CS syllables each appeared 
eight times in the list, the six remaining syl- 
lables three times each, and two previously 
unused syllables (TOV, VAF) three times each. 
\ different random order of the list was used 
for each S. 

The UCS was administered to the palm of the 
right hand approximately .5 sec. after E read 
the CS syllable. A 


shock was used, the 


variable magnitude of 
level for each S being 
established by increasing the amperage until 
he felt it to be annoying and increasing this 
level by 1 ma. to insure a punishing stimulus. 
The range of levels for all Ss was from 2 to 12 ma. 

Immediately after the practice associations, 
the basic sesistance of S was recorded and the 
40 syllables were presented orally by E at 
intervals of 10 to 25 sec. The interval was 
varied to prevent anticipatory responses, and 
sufficient time was allowed for S’s resistance to 
return to the prestimulus level. Resistance 
immediately prior to the reading of the syllable 
and the minimum resistance following it were 
recorded for all words of the list. 

All conditioning occurred during the reading 
of the 40-syilable list and was immediately 
followed by extinction. The S was not told 
that the shock would not occur. In all other 
respects, the procedure was identical with that 
used in conditioning and the same measurements 
were obtained. 

Treatment of the scaling data.—The prelimi- 
nary scaling data was scored by summing across 
the three scales within the three factors. These 
sums were analyzed for each of the syllables 
and factors. Final scaling data was similarly 
scored and the sums for the CS syllable and the 
three related to it were analyzed by a mixed 
factorial design involving 12 measures, four 
Sums for all Ss 
on all syllables were factor analyzed by the 


method described by Suci (10). 


Treatment of extinction data.—The extinction 


on each factor for each S. 


trials were employed as measures of mediated 


generalization. ‘The percentage of ohmic change 


433 


was transformed to a log-conductance change 
score by means of an enlarged version of the 
nomogram provided by Nichols and 
(6, p. 460). The mean of each S's log-conduct 
ance change to neutral syllables was subtracted 
from the mean of the change to each of the 
eight pretraining syllables. These difference 
scores were used in the analysis of variance. A 
separate analysis was performed for first extine- 
tion trials using scores similarly computed. To 
test whether conditioning had taken place, the 
mean GSR of each S to the CS syllable during 
extinction was compared to his mean reaction 
to the syllable used as CS for the other group 
It was found that one S had not conditioned 
(P > .05) and he was replaced 


Daroge 


RESULTS 


Efficacy of matching.-Since the 
concept formation task was the same 
for both conditioning groups, Pear- 
son r was computed for each group 
for the scores on the Hanfmann- 
Kasanin and error scores on concept 
formation. For the group conditioned 
to angularity concepts, r was .44 
(P < .03, one-tailed test). For the 
circularity group, r was .28, which 
fails to reach significance. The F 
for variances of the two conditioning 
groups on concept formation 
1.48 (P > .05). The mean of the 
error scores on concept formation 
was 27.71 for the circularity group 
and 23.08 for the angularity group. 
These means do not differ significantly 
by ¢ test. 

Extinction data.—Trends for the 
two types of CS the eight 
non-reinforced trials, and the trends 
over the three trials for the non-CS 
syllables of each category are shown 
in Fig. 2. As can be seen, the Ss 
conditioned to the angularity syllable 
showed a higher level of reactivity 
throughout the series than those 
conditioned to circularity. ‘The over- 
all means are not significantly dif- 
ferent, however, as may be seen by 
the examination of the analysis of 
variance given in Table 2, in which 


was 


over 





LEONARD LIPTON AND RICHARD L. BLANTON 








Triets 


Fic, 2. Mean log-conductance change to 
CS and non-CS of each category for extinction 
trials. 


the variance attributable to the 
difference between concepts (C) and 
the concepts by general zation inter- 
action (C XG) are not significant. 

The error term in the analysis 
for between Ss is the MS between- 
Ss-within-groups, and for within Ss, 
the pooled Ss K G. In this analysis, 
only the variance attributable to 
generalization stimuli is significant 
(P< OO1, F = 50.2, df =3 and 
108). 


TABLE 2 


Anacysis or PGR Data 


Source 


Between Ss 
Concepts (C) 
Learning (L) 
Ability (A) 
CXL 


Error b 


Within Ss 


Generalization (G) 











The analysis of variance for the 
first extinction trials produced similar 
results. The F for generalization 
in this analysis was 33.14 (P < .001). 

The generalization curves for mean 
extinction level and first extinction 
trials are shown in Fig. 3. Curves 
for the two categories of concepts 
are shown separately. The curves 
show a decreasing strength of response 
with increasing stimulus distance. 
Since the abscissa scale is arbitrary 
the form of the function cannot be 
established. 

Scaling data.—-For the sums ob- 
tained from the preliminary scaling, 
means on the evaluative, activity, 
and potency factors for combined 
syllables were 3.97, 4.09, and 4.06, 
respectively, with SD’s of .81, 1.07, 
and .96, respectively. Since a scale 
value of 4.0 represents neutrality 
of meaning, it may be concluded 
that the Ss found the syllables 
relatively meaningless. 

Variability on the final scaling 
which might have been due to the 
initial meanings of the syllables was 
controlled by random assignment of 








GENERALIZATION «CONTINUUM 


Fic. 3. Mean log-conductance change to all 
syllables on the generalization continuum. First 
extinction trials and mean reaction during ex- 
tinction are presented separately. 





MEASURES OF MEANING 


TABLE 3 


ANALYSIS OF VARIANCE OF THE 
Semantic DirreReNntiAL 


MS PF 

Between Ss 
Form (F) 
Learning (L) 
Ability (A) 
FXL 


11.87** 


FXA 

LXKA 

FXLXA 
Error b 
Within Ss 

Gen. (G) 

Semantic factors (D) 
Gx*D 


2.81° 


s.1se* 
2.75* 
8.52°** 
3.42** 


xx 


ed Ne ee 


—— ee 


Poo >>> >>>? 


1.05 
AM ia 


AWK KSA DAMNMYNWHAANWY 


1.72 
2.08* 


F 
F 
L 
L 
L 
A 
A 
A 
F 
F 
F 
F 
F 
F 
L 
L 
L 
FE 
F 
F 


xx XX KKK KKK KKK K KX 
xX XK XK KK KK KK 


3.42°%* 


2) 


Pooled Ss ; 
Pooled Ss ix! 








Total 





syllables to generalization stimuli 
for each S. The analysis of the 
data obtained from the final scaling 
procedure is presented in Table 3. 

The only significant source of 
variance in the inter-S analysis was 
due to geometrical forms (Ff). Non- 
sense syllables which had been asso- 
ciated to degrees of angularity are 
rated differently (P < .005) from 
syllables representing degrees of cir- 


435 


cularity. The only significant (P 
< .05) simple source of variation in 
the intra-S analysis was due to the 
generalization continuum (G). The 
scale values for the means of the 
concepts were plotted against an 
arbitrary abscissa and the curve is 
shown in Fig. 4. Again a decreasing 
function appears, although, since the 
last three points are not significantly 
different from each other or from the 
level of meaning obtained in the 
preliminary scaling, little can be 
said about the meaning of this curve. 
One reason for this ambiguity may 
be found by examining the forms by 
generalization interaction (IF & G) 
which is also significant (P < .0S5). 
While the same source of variance 
obtained from the GSR data was 
not significant, suggesting that medi- 
ated generalization curves for angu- 
larity and circularity syllables were 
similar, the amplitudes of meaning 
reactions to the syllables in scaling 
were not the same for the two stimu- 
lus dimensions. Examination of the 
Forms by Semantic factors (F & D) in- 
teraction (P < .001) shows that while 
the circular components became better, 
less active and weaker with increasing 
distance from the full circle, the 
angular components tended to in- 








GCEMERALIZATION ComTinggs 


Fic. 4. Generalization of connotative meaning 
summed over forms and meaning factors. 





LEONARD LIPTON AND RICHARD L. BLANTON 


~ 


ao 


hy 


‘ 





Fic. 5. Clusters isolated in factor analysis of 
Semantic Differential ratings after pretraining. 


crease in activity and potency with 
increasing distance from the full 
triangle. This finding accounts, at 
least in part, for the significant 
higher order interactions in Table 3. 

The results of the factor analysis 
of the scaling data, taken to the first 
two factor loadings, appear to support 
the analysis-of-variance results. Two 
clearly defined contributions to the 
total variability were isolated, one 
cluster of points establishing a dimen- 
sion of circularity while the other 
corresponds to angularity. Some re- 


sidual variance remained which may 
be due to a factor of completeness 
or incompleteness of the figures to 
which the syllables had been asso- 


ciated, The unrotated factor struc- 
ture is shown in Fig. 5. Inspection 
of this figure leads to the inference 
that the dimension of circularity was 
properly defined by the card stimuli 
while that of angularity was not 
The order of meaning for the former 
is Gy, Go, Gs, Gs, and is consistent 
with decreasing amplitude of GSR 
reaction. ‘The order of meaning for 
the later is Gs, Ge, Gy, Gs, and is 
inconsistent with decreasing ampli- 
tude of GSR reaction. 


Discussion 


The results of this investigation sug- 
gest that both concept formation and 
semantic generalization fit the mediation 
paradigm rather well. The meanings 
of initially meaningless words, whether 


measured by GSR conditioning or by 
Osgood’s scaling method, can be manip- 
ulated consistently by experimental pro- 
cedures. The study was admittedly 
exploratory, and several modifications 
of procedure would be advisable in 
additional investigations. A matched- 
groups design in which one group per- 
formed the final scaling and the other 
received the GSR conditioning would 
eliminate any effect which the scaling 
operation may have had on the GSR 
reactions, for example. The use of 
other types of concept formation tasks 
should be considered; and it might be 
possible to define an abscissa for the 
generalization curves by independent 
scaling operations on the objects to 
which nonsense materials are to be 
associated. 

Differences in conceptual ability were 
not found to be related to level of 
mediated generalization which may be 
be due to lack of validity or reliability 
of the Hanfmann-Kasanin. While the 
Ability groups differed significantly, the 
Hanfmann-Kasanin did not share much 
variance with the concept formation 
task (r = .36). Levels of repetition or 
overlearning as used in this study were 
not related to level of reaction, which 
might have been predicted had we con- 
sidered previous findings on retention 
of concepts to be relevant to the question 
of the level of meaning. 

An important finding is that the Osgood 
Semantic Differential has some validity 
for the measurement of meaning under 
strictly controlled experimental condi- 
tions in which meanings are induced by 
pretraining. The data it provides are 
multidimensional, but the dimensions 
are sufficiently identifiable to provide 
valid descriptions of induced changes. 
It may be that each of the systematic 
conditions of learning, e.g. reinforce- 
ment, repetition, etc., is associated with 
a source of variance measurable by 
Osgood’s method. The factor of evalua- 
tion, which accounted for the largest 
portion of the identifiable variance in 
Osgood’s original study is probably 
related to the conditions of reward or 
punishment actually or symbolically 





MEASURES OF MEANING 


associated with the word during the 
time its meaning was being acquired. 
Some confirmatory evidence for this 


has been obtained in a recent study by 
Smith (13). 


SUMMARY 


Two groups of 24 Ss, obtained by matching 
pairs on their Hanfmann-Kasanin Test scores, 
scaled eight nonsense syllables on the Semantic 
Differential. They then associated these syl- 
lables to a set of 64 stimulus cards, half of which 
contained one of four degrees of circularity and 
half one of four degrees of angularity as common 
elements. After this concept formation task, 
the syllables were re-scaled on the Semantic 
Differential, and Ss were presented a list of 40 
syllables composed of repetitions of the original 
eight plus two neutral ones. The syllable 
associated with the full circle was paired with 
shock for one group for eight presentations within 
the 40-syllable list. For the other group, the 
syllable associated with the full triangle was 
used. GSR measures were taken on all pre- 
sentations of related and neutral syllables during 
a comparable extinction series. Results were: 

1. The amplitude of mediated or semantic 
generalization was an inverse function of the 
difference between the original stimulus figures 
to which the syllables had been associated 
whether measured by the GSR or by the Seman- 
tic Differential, although inability to fix points 
on the abscissa leaves the form of the gradient 
in doubt. 

2. Since the Semantic Differential provides 
multidimensional measures, the obtained gradi- 
ent probably holds only for a stimulus dimension 
clearly defined by the characteristics of the 
figures to which associations are made. 

3. Differences in conceptual ability and 
level of learning were not found to be sig- 
nificantly related to level of generalization 
measured by either method. Reasons for these 
findings were briefly discussed. 

It is concluded that manipulation of experi- 
mental conditions can lead to predictable 
changes in meaning. Its description may be 
quite complex, however, when both the stimulus 


437 


pattern of the object and the conditions under 
which meaning is acquired are considered. 


REFERENCES 


. Corer, C. H., & Forey, J. P., Jr. Medi- 
ated generalization of verbal behavior: 
I. Prolegomena. Psychol. Rev., 1942, 49, 
513-540. 

. Driven, K. Certain determinants in the 
conditioning of anxiety 
Psychol., 1937, 3, 291-308, 

. Ersen, N. H. The influence of set on 
semantic generalization. J. abnorm. soc 
Psychol., 1954, 49, 491-496. 

. Huu, C. L. Principles of behavior. 
York: Appleton-Century, 1943. 

. Lacey, J. L., & Smirn, R. L. Conditioning 
and generalization of unconscious anxiety. 
Science, 1954, 120, 337-342. 

. Nicnous, R. C., & Daroce, T. An elec- 
tronic circuit for the measurement of the 
galvanic skin response. Amer. /. 
Psychol., 1955, 68, 455-461. 

. Nopie, C. E. An analysis of meaning 
Psychol. Rev., 1952, 59, 421-430. 

. Oscoon, C. FE. The nature and measure 
ment of meaning. Psychol. Bull., 1952, 
49, 192-239. 

. Oscoon, C. E., & Seseox, T. A 
Psycholinguistics: a survey of theory 
and research problems. J. abnorm. soc 
Psychol., 1954, 49 (4, Pt. 2—Suppl.) 

. Oscoon, C. E., & Suc, G. J. Factor 
analysis of meaning. J. exp. Psychol., 
1955, 50, 325-338. 

. Razran, G. Stimulus 
conditioned responses. 
1949, 46, 337-367. 

. Reev, H. B. Factors influencing the 
learning and retention of concepts. 
J. exp. Psychol., 1946, 36, 71-87. 

. Smiru, P. A. Some effects of reward and 
punishment on visual perception. Un- 
published doctor’s dissertation. 
of Kentucky, 1955. 

. Vinacxe, W.E. The psychology of thinking. 
New York: McGraw-Hill, 1952. 


reactions. J. 


New 


( Eds ) 


generalization of 


Psychol. Bull., 


Univer. 


(Received October 22, 1956) 





Journal 4, Experimental Psychology 
Vol. 54, No. 6, 1957 


CUTANEOUS DISCRIMINATION OF RADIANT HEAT! 


WARREN H. 'TEICHNER 


Quartermaster Research and Development Center 


Methodologically, recent studies of 
cutaneous sensitivity to radiant heat 
which have led to theoretical develop- 
ments in the psychophysiology of 
the skin have been of two major 
kinds, those which have used an 
ascending method of limits, usually 
holding exposure time constant while 
varying intensity (e.g. 4) and those 
which have presented S with a 
constant intensity for an extended 
exposure time during which time S 
reports his feelings as they occur 
(5). Both methods have advantages 
and disadvantages, but, in general, 
results obtained agree in indicating 
that heat transfer parameters and 
skin temperature phenomena may be 
relatable to subjective response cate- 
gories of various sorts. However, 


these two types of studies have not 


agreed in their indication of the 
specific relationships between heat 
transfer characteristics and subjective 
responses. Use of each has led to 
at least two different sets of inferred 
relationships and, as a consequence, 
strong disagreement in basic theory 
of cutaneous sensitivity. On the 
one hand one finds the notion of 
receptor and response specificity (3, 
4); on the other, the notion that all 
cutaneous sensation is an intensity 
variation of a single response dimen- 
sion mediated by nondifferentiated 
receptors (5, 8). 

The present study was an attempt 
to apply a third psychophysical 
method to the study of thermal 
sensitivity, one which appeared to 
combine desirable aspects of the 

The major results of this study were pre- 


sented at the 1956 meeting of the American 
Psychological Association. 


other two with no loss in reliability, 
and to reinvestigate some of the 
important thermal response relation- 
ships. This was done by presenting 
S with stimuli which varied in both 
intensity and exposure time and 
allowing him several categories of 
response—the method of single stimu- 
li (9). In addition to studying the 
effects of heat intensity and exposure 
time, the present investigation was 
also concerned with the possible 
effect of their product, the total 
heat exchange or dosage, a character- 
istic of the heat transfer which does 
not seem to have been studied before 
in this connection. 


MetTuop 


A radiant heat stimulator designed in prin- 
ciple after that described by Hardy, Wolff, 
and Goodell (4) was used to provide the stimu- 
lation. The heat projector was a compact 
pistol-shaped device mounted on a stand and 
adjustable s» that the low-conducting muzzle 
end could be brought to normal with respect to 
the skin surface. The projector housed a G.E. 
100-w. projection bulb the light of which was 
focused to provide uniform illumination within 
a circle 1 cm. in diameter in the center of a 
total field of illumination of 2 cm. in diameter. 
The lamp intensity was controlled by a power- 
stat housed in a separate control unit. Line 
voltage was controlled with a voltage stabilizer 
at the line end and a variable transformer with 
a voltmeter housed in the control unit. On-off 
switching was accomplished through SPST 
switches one of which was mounted in the 
projector and one in the control unit. In 
the present study exposures were controlled 
by £ with the control unit. Exposure duration 
was selected and controlled with a Stoelting 
electronic interval timer which was wired into 
the switching circuit. 

The apparatus was calibrated prior to the 
experiment and twice a week thereafter with 
a calibrated thermopile and a Leeds-Northrup 
recording potentiometer. Measures of skin 


438 





CUTANEOUS DISCRIMINATION OF RADIANT HEAT 


temperature was obtained with a Hardy dermal 
radiometer which was calibrated against a 
Leslie cube. 

The experiment was conducted in a quiet 
room containing only S and E. Room tem 
perature was determined at the beginning and 
end of each experimental session. Prior to 
an experimental session, S washed the volar 
side of his wrists with soap and water, dried 
them carefully and then, a few minutes later 
applied a coat of india ink approximately 
1 cm. in diameter on the hairless portion of the 
skin, About 5 to 10 min. after the ink was 
thoroughly dry S’s skin temperature was 
measured over the general inked area. That 
is, the skin temperature measurements were 
based upon radiation from both painted and 
unpainted skin. When five successive identical 
values of skin temperature were obtained, S’s 
skin temperature was said to have stabilized 
and stimulations were begun. 

The two wrists were stimulated alternately. 
Both wrists were used since preliminary experi- 
mentation had indicated that the skin tempera- 
tures of the wrists for this type of stimulation 
should not differ except by chance? Measure- 
ments of initial skin temperature were taken 
before each stimulation to insure that skin 
temperature had recovered to its prestimulation 
or initial level. If recovery occurred before 
90 sec. since the last stimulation of the same 
site, further stimulation was withheld until 
a minimum 9%0-sec. interval had elapsed. If 
after 90 sec. recovery was not complete but 
was within 1.0° C, of the initial temperature, 
the stimulus was applied. 

Twenty-three different combinations of heat 
intensity and exposure time were used as a 
stimulus series. This series. was presented 
twice within each of 10 daily experimental 
sessions, in a different random order in each 
session. Within each sessidh; the series was 
presented first starting with the left wrist, 
then starting with the right wrist, both times 
in the same random order. The stimuli fell 
into three dosage groups (intensity X time) 
of 600, 800 and 1000 mc/cm* The actual 
stimulus values are presented in Table I: 

At a verbal ready signal S placed his wrist 
against the fiber end of the projector; E pre- 
sented the stimulus shortly afterwards. The 
Ss were asked to attend to the situation as 
closely as possible and immediately after the 
heat was turned off to classify their final feeling 


* This was verified by the data obtained. 


The largest single instance of a difference 
between wrists in skin temperature during 
the entire experiment was 1.8° C. Most 
differences were considerably less than 1.0° C. 


TABLE 1 
Stmmu.us Conprrions 
Intensity Exposure Time (Sec.) for Dosages 
(mec./om | 


mm | 600 me. em.?, 800 me./em,*,1000 me./cm.* 


400 5 
320s 
240} 
200 
100 
133 
120 
107 
100 
80 
64 
71) 
48 


as one of the following; no sensation, warmth, 
heat, pricking pain, burning pain. The Ss 
were given no further descriptions or instruc- 
tions to aid them. Six Ss were used; three 
were professional psychologists and three were 
college students. All Ss had served previously 
in one or more single category experiments 
in which the same apparatus had been used. 


ResuLts 


Each individual experimental ses- 
sion lasted an hour or more, one- 
half hour per random series approxi- 
mately, and it was therefore necessary 
to determine whether there were 
important uncontrolled changes in 
skin and‘air temperature during that 
time. Analysis of the air temperature 
measurements showed that the mean 
air temperature for the experimental 
sessions was 25.77° C.; the SD was 
3.06° C. This is apptoximately the 
center of the range of standard 
testing conditions of 20°-30° C. 
recommended by Hardy, Wolff, and 
Goodell (4). The mean increase in 
air temperature during the experi- 
mental sessions was .15° C. The 
product-moment correlation between 
increases in air temperature and in 
initial or baseline skin temperature 
during the sessions was .15. It 





440 WARREN H. 


would seem, therefore, that changes 
in air temperature during experi- 
mental sessions may be ignored as an 
important source of uncontrolled vari- 
ation in the results. 

The mean increase in initial skin 
temperature during the experimental 
sessions was .45° C.; the SD was 
52° C. Since this increase is un- 
related to changes in air temperature, 
it must have been due to incomplete 
dissipation of heat from the skin 
following radiation. ‘Thus, it would 
seem that the 90-sec. minimum delay 
between restimulation of a test spot 
was not completely effective in allow- 
ing the skin to recover to its initial 
value. ‘The size of increase, however, 
does not seem to be large enough to 
be considered an important source of 
variation in the data, and in any case 
such increase would be expected to 
have a random influence since the 
stimuli were randomized. 

Since none of the Ss had had 
previous experience in making thermal 
judgments in the manner of this 
experiment, and since no preliminary 
practice was given in the use of the 
five judgment categories, it was 
necessary to determine whether there 
was a practice effect on the judg- 
ments. The combined frequencies 
for all Ss were determined for each 
kind of response to each stimulus in 
each of the 10 sessions of the experi- 
ment. ‘These frequencies tended to 
increase very slightly through the 
first half of the experiment and then 
to decrease during the second half, 
but not below the initial level of 
response frequency of the first half. 
It was concluded, therefore, that 
practice effects were negligible. Nev- 
ertheless, further analysis of the data 


was in terms of the frequency of 


for both 
over all 10 sessions. 
The method of single stimuli (9) 


response wrists combined 


TEICHNER 


was used to determine intercategory 
thresholds and SD’s. Proportions ex- 
ceeding the range .05—.95 were not 
used. Values were obtained graphi- 
cally from a least squares fit of the 
z scores. This procedure was applied 
to the intensities within each dosage 
level only; exposure time values were 
obtained by simple calculation. The 
alternate procedure, that of determin- 
ing exposure time thresholds and 
calculating intensities was also per- 
formed as a check and found to 
yield the same numerical results 
within close limits. This procedure 
was successful in obtaining thresholds 
for all positive thermal categories 
except warmth at the highest dosage 
and burning pain at the lowest. 

The thresholds obtained allow for 
the relation of intensity to exposure 
time by comparing values across 
dosages. These relationships for each 
of the response categories are shown 
in Fig. 1. Although the relation- 
ships shown are not extensive, they 
are revealing. Heat, pricking pain, 
and burning pain lines suggest parallel 
relationships. If higher intensities 
had been used, the first burning pain 
value would probably have fallen 
at the shortest exposure time. ‘Thus, 
the data indicate that as intensity 
increased—exposure time decreasing, 
first warmth then heat—then pricking 
pain and finally burning pain were 
eliminated from S’s response set. 
Warmth thresholds were obtainable 
only for low intensities and long 
exposures; burning pain only for 
high intensities and short exposures. 

Also shown in Fig. | are the three 
comparable pricking pain thresholds 
obtained by Hardy, Wolff, and Goodell 
(4), as read from their more extensive 
intensity-time pricking pain curve. 
Agreement of these values with the 
present ones is fairly good. 

To consider the effect of final 





CUTANEOUS DISCRIMINATION OF RADIANT HEAT 


Intensity (mc/em*®/sec.) 





--—-—® Warmth 
o-----8 Heot 
Pricking Pain 
® Burning Pain 
Hardy's Pricking Pain Values 





skin temperature, Buettner’s formula 
(1, 2) as refined by Lipkin and Hardy 
(6), was applied to each of the 


2760 observations of the experiment 
in order to calculate the final skin 
temperature that would be expected 
from the heat exchanges involved. 
Intercategory skin temperature thres- 
holds based upon all of these values 


were then determined, again with 
the method of single stimuli. The 
results along with the intercategory 
intensity thresholds of each dosage 
level, obtained from the first analysis, 
are shown in Fig. 2. 

Consider first only the ordinate of 
Fig. 2. This axis presents the inter- 
category skin temperature thresholds. 
Inspection of this axis shows that 
the temperature between 
response categories decreased succes- 
sively from warmth to burning pain. 
The pricking pain threshold is 1.8° C 
higher than that usually reported by 
Hardy (3, 4). Assuming Hardy’s 


difference 


Intensity-exposure time relationships for cutaneous thermal sensations. 


value is the true mean, this difference 
yields a ¢t significant beyond the .001 
level. 

Now consider only the four base- 
lines of Fig. 2. The three dosage 
abscissae are referred to the lowest 
abscissa for their intensity values. 
These abscissae and their values all 


50> 


FINAL SKIN TEMPERATURE (°C) 





1000 me /em™ 
|, PP BOO mesemt 
4 600mejem" PP 
100 200 300 400 
INTENSITY (mesem*/sec) 














Fic. 2. Cutaneous thermal sensations as a 
function of skin temperature, stimulus intensity, 


-and dosage. 





442 WARREN H. 


are derivable from the data used to 
plot Fig. 1. Inspection of these 
values shows that within any par- 
ticular dosage level the intensity 
difference between response categories 
was approximately equal. In com- 
paring responses across dosages, it 
may be seen that as dosage increased 
the intensity difference between cate- 
gories decreased and that the entire 
intensity range covered by the re- 
sponses shifted to lower intensity 
values. 

In to show the _ relation- 
ships between the dosage scales and 
the temperature scale, corresponding 
threshold values were plotted with 
dosage as the parameter. These are 
the lines shown in Fig. 2. These lines 
have no significance other than to 
indicate the nature of transformation 
involved in going from the dosage 
to the temperature scale. ‘The trans- 
formation functions are in actuality 
empirical expressions of Buettner’s 
(1, 2) equation as developed by 
Lipkin and Hardy (6) and differ 
from expected values presumably only 
by virtue of the variability of initial 
skin temperatures. 

Of considerable interest are the 
SD’s associated with the thresholds 
since these indicate the precision or 
reliability of the judgments and may 
be used as an index of sensitivity. 
The SD’s of the intensity thresholds 
along the dosage lines of Fig. 2 were 
roughly one-third the magnitude of 
the thresholds. In general, these 


order 


SD’s decreased with increasing dosage. 
The SD’s of the thresholds expressed 
as skin temperature were the follow- 
ing proportions of their associated 
thresholds: warmth 1/13; heat, 1/10; 
pricking pain, 1/18; burning pain, 
1/20. 


Discussion 


Obtaining a pricking pain threshold 
at a significantly higher skin temperature 


TEICHNER 


than usually reported by Hardy may 
be due to differences between S groups, 
precision of skin temperature measure- 
ment, lack of complete heat dissipation 
as noted, or other factors. Important 
among these other possible factors is 
the present use of Lipkin and Hardy’s 
(6) and Lipkin, Bailey, and Hardy’s (7) 
most recently reported skin constants 
which differ somewhat from those used 
in their earlier studies (4). When these 
things are considered, the obtained 
difference of 1.8° C. does not seem 
importantly large. 

The method used produced results 
comparable to those of both method of 
limits and prolonged exposure studies. 
Thus, all three methods are intersup- 
porting. In particular, since the present 
method yielded pricking pain thresholds 
comparable to those obtained by Hardy 
with the method of limits, some credence 
is added to the other threshold values 
obtained in the present study. This 
evidence, however, requires verification 
through further experimental study. 

The results agree with those of Lele, 
Weddell, and Williams (5) who used 
the prolonged exposure, constant in- 
tensity method in showing that thermal 
sensations of the same sort may occur 
at a variety of heat intensities. The 
present results qualify their conclusion 
in that warmth was not obtained at 
high intensities nor burning pain at 
low ones. The failure to elicit burning 
pain at low rates of heat exchange is 
presumed to be due to the near equi- 
librium of rate of heat absorption and 
rate of heat loss at the test spot. The 
failure to elicit warmth at high intensities 
is presumed to be due to the rapid rise 
of skin temperature associated with 
high intensities with the result that 
with such increasing intensity, the la- 
tency of warmth reaches a physiological 
limit. At the same time it may be 
hypothesized that the latencies of heat, 
pricking pain and burning pain are 
decreasing and approaching that of 
warmth. Thus, for a given high in- 
tensity the perception time of warmth 
is so short and the perception time of the 
next sensation in such close temporal 





CUTANEOUS DISCRIMINATION OF RADIANT HEAT 


sequence that S§ has little or no op- 
portunity to perceive warmth before 
perceiving heat. It may also be hypoth- 
esized that with still higher intensities, 
a “collapsing” of response latencies 
occurs with the result that with increas- 
ingly high intensities, heat and then 
pricking pain are not elicitable. It 
also follows that with still greater 
increases in intensity, not only would 
burning pain be the only elicitable 
response, but it would become inde- 
pendent of exposure time. These hypoth- 
eses tend to be substantiated by the 
results shown in Fig. 1. 

Even though the results agree with 
the conclusions of Lele, Weddell, and 
Williams (5) as indicated above, they 
also show very clearly the inadequacy 
of specifying the heat source as a stimu- 
lus in terms of heat intensity alone. 
It is necessary for this purpose to 
specify both intensity and exposure 
time or dosage and either intensity 
or exposure time. If the latter is done, 
it must be kept in mind that a different 
subjective continuum exists for each 
dosage level. The failure of Lele, Wed- 
dell, and Williams (5) to compare their 
results to those of Hardy, Wolff, and 
Goodell (4) in terms of more than one 
heat parameter seems to account, to 
a great extent, for their claim of discrep- 
ant pricking pain results. The present 
data are in reasonable accord with those 
of Hardy, Wolff, and Goodell. 

The present finding that dosage is 
not a psychophysical constant has im- 
portant theoretical implications. Since 
skin temperature is directly related 
to heat intensity, but exponentially 
related to exposure time (1, 2, 3), the 
the skin temperature threshold of a 
given sensation must vary for constant 
dosages of variable intensity and time 
values. It is 


conceivable, therefore, 


that skin temperature may be a psycho- 
physical constant for some, if not all, 


cutaneous thermal sensations. That this 
is the case for pricking and burning 
pain has been demonstrated repeatedly 
by Hardy and his colleagues. It seems 
to the writer that the question is still 


443 


open for the other thermal cutaneous 
responses. 

Even though the results indicate that 
thermal sensations may be related to 
heat exchange parameters, they also 
indicate that they may be related with 
greater reliability and/or sensitivity to 
skin temperature. In addition to hav- 
ing this desirable feature, skin tempera 
ture has the advantage of being both 
unidimensional and the more proximal 
stimulus. 


SUMMARY 


A study was conducted of the S-R relation- 
ships involved in cutaneous thermal subjective 
responses to radiant heat exchange. Using 
the method of single stimuli,- six Ss responded 
to randomly arranged stimulus combinations 
of heat intensity and exposure time by assigning 
each to one of five subjective response categories 
The data obtained analyzed in terms 
of the relationship of the responses to heat 
intensity, exposure time, 
temperature. The results 
the following conclusions: 


were 


skin 
permit 


dosage, and 
appear to 


1. The present method produces results 
comparable to those obtained by the method 
of limits and with a “prolonged exposure pro 
cedure.” 

2. Sensations of warmth are not elicitable 
at very high heat intensities, nor, are sensations 
of burning pain elicitable at very low ones 

3. For the stimulus situation to be adequately 
described, two characteristics of the 
exchange must be specified 
either heat intensity and exposure time or 
dosage and either intensity or time. However, 
the results indicate that the most useful S-R 
relationship is that between sensation and skin 
temperature 


heat 


These may be 


4. The results agree with those of Hardy 
and his colleagues in the relationship of pricking 
pain to heat exchange and skin temperature 
and, in this regard, therefore, do not agree 
with those of Lele, Weddell, and Williams 


REFERENCES 


Effects of extreme heat and 
skin. I. Analysis of 
temperature changes caused by different 
kinds of heat application. J. appl 
Physiol., 1951, 3, 691-702 


1. Buerrner, K. 


cold on human 





444 


2. Buetrner, K. Effects of extreme heat and 
cold on human skin. IJ. Surface tem- 
perature, pain and heat conductivity 
in experiments with radiant heat. /. 
appl. Physiol., 1951, 3, 703-713. 

. Harpy, J. D. review of the 
influence of thermal radiation on human 
skin. U. S$. Naval Air Development 
Center, Rep. No. NADC-MA-5415, 
Nov 1954. 

. Harpy, J. D., Woxrr, H. G., & Goopett, H. 
Pain sensations and reactions. Balti- 
more: Williams & Wilkins, 1952. 

Lece, P. P., Weopers, G., & Wittiams, 
C. M. The relationship between heat 
transfer, skin temperature and cutaneous 


J. Physiol., V954, 126, 206 


Summary 


, 


sensibility 


234, 


WARREN H. TEICHNER 


6. Lipkin, M., & Harpy, J. D. Measurement 
of some thermal properties of human 
tissues. U. S. Naval Air Development 
Center, Rep. No. NADC-MA-5405, 
April 1954. 

7. Lipkin, M., Batrey, O., & Harpy, J. D. 
The effect of ultra-violet irradiation 
upon the cutaneous pain threshold. 
U. S. Naval Air Development Center, 
Rep. No. NADC-MA-5414, Oct. 1954. 

8. Weppett, G. Somesthesis and the chemical 
senses. Annu. Rev. Psychol., 1955, 6, 
119-136 

9. Woopwortun, R. §&., 
Experimental psychology. 
Henry Holt, 1954. 


& Scutosserc, H 
New York: 


(Received November 27, 1956) 





Journal of Experimental Psychology 
Vol. 54, No. 6, 1957 


THE ROLE OF PREFEEDING IN AN APPARENT 
FRUSTRATION EFFECT! 


JOHN P. SEWARD, A. CLINTON PEREBOOM,? BRUCE BUTLER, 
AND ROBERT B. JONES 


University of California, Los Angeles 


To handle the phenomena of goal 
striving, the senior author has else- 
where proposed the concept of tertiary 
motivation (5). <A tertiary drive 
(D3) is produced when a response is 
strongly but partially activated under 
conditions that prevent its completion. 
Eating movements, for example, con- 
ditioned to an antedating situation 
that tends to evoke them before food 
becomes available, give rise to drive 
stimuli; these in turn energize other 
responses in progress. Sheffield, Roby, 
and Campbell (6) have suggested the 
same mechanism and used it to ac- 
count for the selection of instrumental 
responses commonly termed 
forcement. 

If Ds depends on the partial 
frustration of an anticipatory goal 
response (rq), anything that enhances 
that condition should heighten drive. 
Consider a hungry rat accustomed to 
finding one food pellet halfway down 
a runway and a second pellet at the 
end. In the first half of the runway 
intensity of rg is determined jointly 
by the generalization gradients from 
both points of reward; in the second 
half, by that of the second point 
combined with perseverative traces 
from the first. If, then, on a certain 
trial the rat found no pellet at the 
halfway mark, rg would be blocked at 
its point of highest intensity. The 
frustration thus produced should so 
modify the perseverative component 
as to strengthen Ds, thereby increas- 


rein- 


1 Aided by a grant from the Research Com- 
mittee, University of California, 
2 Now at Texas Technological College. 


ing speed in the second half of the 
runway. 

After planning an experiment to 
verify this prediction it was found 
that it had already been done in a 
different theoretical context. Seek- 
ing evidence on the drive properties of 
frustration, Amsel and Roussel (1) 
trained rats in an enclosed runway 
with two goal boxes 10 ft. apart. 
After prolonged training to a pellet in 
each goal box a test series was run 
with the first pellet omitted on half 
the trials. On these trials both 
latencies and running times to the 
second goal box were consistently re- 
duced, a finding verified in a further 
study by Roussel (4). 

As the authors pointed out, how- 
ever, this result was not necessarily 
due t® frustration. It might 
been due, not to an added drive on 
one-pellet trials, but to diminished 
drive on two-pellet trials—diminished, 
i.c., by eating the first pellet. This 
possibility could be tested by measur- 
ing the effect of prefeeding apart from 
that of frustration. 

The effect of prefeeding is also of 
interest for the Ds hypothesis. At 
the outset this interest was largely 
empirical. Evidence presented by 
Maltzman (3) suggested that 
feeding a small amount would induce 
faster running. But the 


have 


pre- 


conditions 


under which this would happen were 


not clear, nor did the present theory 
help much. Arousal of ‘the 
summatory eating response would be 
expected to alter its antedating com- 
ponent. But since the drive function 


con- 


445 





446 








Se 
t [re"Tse | 
- —— — Pomme «466, 


No 
nN {9p 


Fic. 1. Ground plan of apparatus. PB, pre- 
feeding box; SB, start box; A and B, runways; 
GB, and GB», goal boxes; P; and Ps, photocells ; 
OD, opaque door; TD, transparent door. 
Small circles are food dishes. 











t 














of rq was assumed to depend on its 
blocking, how would Dy, vary with its 
prior completion? No specific pre- 
diction was attempted, at least for the 
part of the experiment that is the 
present concern, (As will appear 
below, however, the findings pointed 
the theory toward a definite commit- 
ment.) 

It was decided, therefore, to proceed 
with the experiment and to include a 
prefeeding factor, partly to evaluate 
the frustration effect and partly to 
throw further light on the motivating 
function of rq. 


Metruop 
Apparatus 


Figure | gives a ground plan of the apparatus. 
It consisted of a continuous runway marked off 
by guillotine doors into six sections, as follows: 
(a) a prefeeding box, PB; (b) a start box, SB; 
(c) a runway, A; (d) a connecting box, GB,; 
(¢) a second runway, B; (/) a final goal box, GB,. 
Both runways and SB were enclosed by 3-in. 
walls and covered with hardware cloth. The 
other boxes were 6 in. high; PB and GB, had 
Plexiglas tops, GB, had hardware cloth. Walls 
and floor were flat black throughout. On a 
superstructure running the length of the ap- 
paratus at a height of 18 in. were mounted five 
74-w. lamps; cheesecloth hanging down on both 
sides provided a one-way screen when the room 
was darkened. By means of mirrors at both 
“nds E could stand midway between and see into 
all boxes 

‘Two doors, one opaque, the other transparent, 
separated GB, from Runway B. Raising the 
transparent automatically started two 
The first, making | rps, was 
stopped by a photocell 6 in. from the door to give 


door 


electric timers. 


J. P. SEWARD, A. C. PEREBOOM, B. BUTLER, AND R. B. JONES 


starting latencies. The second timer made 1 
rpm and was stopped by another photocell 6 in. 
from the door of GB». 

Small aluminum food dishes were screwed to 
one wall of PB, GB,, and GBs, located as shown 
in Fig. 1. 


Subjects 


Thirty-two male hooded rats 3-44 mo. old 
were started; 11 of these were eliminated for 
failure to adjust normally to the apparatus—6 
during adaptation and 5 in training. Data are 
based on the remaining 21 Ss, distributed in four 
replications of 5, 6, 6, and 4 Ss. 


Design 


The experiment was in two parts, training and 
test, each consisting of four trials a day for 12 
days. The purpose of training was to accustom 
S to the sequence of taking Runway A from SB 
with or without a prefeeding in PB, finding and 
eating a pellet in GB, then proceeding by B to 
another pellet in GB,. The purpose of the test 
was twofold: (a) to measure the effect of frustra- 
tion in GB, on performance in B; (b) to measure 
the effect of prefeeding in PB on performance in 
B with and without frustration. This called for 
a 2X2 factorial design with four treatment 
combinations. If we let P and p stand for 
presence and absence of prefeeding, respectively, 
and denote frustration by F and its absence by f/, 
these treatment combinations are as follows: 
pf, Pf, pF, PF. 

On every test day each S experienced all four 
conditions; in training he had two trials of P/ 
and two of pf. To counterbalance order of 
treatments within days the four combinations 
were assigned to the cells of a4 X 4 latin square. 
By rearranging the columns in all possible 
permutations 24 such squares were formed. 
Twelve of them were used to provide a schedule 
for the 12 days of testing. The other 12, with 
all F’s replaced by /’s, served the same purpose 
in training. The Ss were distributed fairly 
equally among the rows of this series of squares. 

One other variable entered the design, or 
rather two variables confounded with each other: 
F-duration, or length of delay in GB, on frustra- 
tion trials; and amount of food received in the 
apparatus. ‘They entered in this way: Delay 
interval for each S was determined by his median 
eating time in GB, for all training trials. Al- 
though Amsel and Roussel found no significant 
difference among delays of 5, 10, and 30 sec., it 
was decided to vary this factor. Accordingly 
11 Ss (those in the first and fourth replications 
and two in the third) were assigned to Group K 
and given | gm. of food in GB, and, on appro 





AN APPARENT FRUSTRATION EFFECT 


priate trials, in PB and GB; the other 10 Ss, 
making up Group &, received 4 gm. in each box. 
F-intervals for Group K ranged from 12.5 to 28.5 
sec. with mean and median at 19; for Group &, 
from 3.5 to 12.5 sec., mean and median 6.8. 


Procedure 


Maintenance.—Subjects were housed three to 
a cage and fed Steenbock mash once a day. 
During the first week, feeding time was cut 
progressively from 1 hr. to 30 min. in the first 
three replications and to 45 min. in the fourth. 
(In Replication 3 it was raised again to 45 min. 
after one week of training.) 

Adaptation.—Adaptation began after seven 
days on food cycle. All trials, as in the training 
and testing to follow, were run before feeding. 

Day 1: Subjects were given 30 min. to explore 
the entire apparatus in groups of about three. 

Day 2: Each S was given 10 min. to explore 
the apparatus alone. 

Day 3: The Ss received the following three 
treatments in rotation: (a) He was put in GB, 
and allowed 2 min. to eat a soft pellet of his 
accustomed diet. (b) He was permitted to run 
from the middle of Runway B to GB, and to eat 
a pellet there. (c) He had one timed trial from 
GB, to a pellet in GB,. 

Day 4: The same three treatments were given 
in the other half of the apparatus. Each S ate 
three pellets in GB,; (a) after being put in 
directly, (b) after entering from Runway A, and 
(c) after a timed trial from SB. 

Day 5: The Ss had two timed trials in rota- 
tion, on each trial traversing both runways in 
succession and finding a pellet in both end boxes. 
One trial started in SB; the other started in PB, 
where S was prefed one pellet. Procedure in 
these trials was as described below for conditions 
pf and Pf. 

Training.—As noted in the section on design, 
training lasted 12 days and consisted of 4 trials 
a day, 2 pf and 2 PS, in counterbalanced orders. 
Subjects were run in rotation with about 10 to 15 
min. between trials. 

Testing.—As also noted above, there were 12 
days of testing with 4 trials a day—pf, Pf, pF, 
and PF—counterbalanced from day to day and 
from S to S. Intertrial intervals fell largely 
between 8 and 12 min. 

Procedure on a single trial of each type was as 
follows : 


pf: Both goal boxes were baited and S was 
put directly in SB. When S nosed the door, 
E raised it and started a double-event stop- 
watch, stopping one hand when S entered GB, 
and stopping the other when § finished the 
pellet and nosed the door. At that point £ 
raised the opaque door of GB:, counted to 3, and 


447 


raised the transparent door. He removed S 
from GB, after the pellet was eaten 

Pf: After baiting PB, GB,, and GBy, & put 
S in PB. When S finished the pellet he was 
admitted to SB and, on nosing the exit door, to 
Runway A. The rest of the trial duplicated p/ 

pF: Only GB, was baited; all food particles 
were removed from GB,. On entering GB,, S 
was detained for his median eating time, as 
measured in training, before E raised the opaque 
door. Otherwise the trial was the same as p/ 

PF; This trial was identical with pF except 
that S was put in PB, where he ate a pellet 
before starting his run. 


RESULTS 
Runway B Times on Test Trials 


One intended measure, starting 
latency, was invalidated because it 
varied with speed of raising the door, a 
defect that was not corrected until the 
third replication. Runway time was 
free of this error, since it was obtained 
by subtracting the time to the first 
photocell from the time to the second. 
This measure, to the nearest .1 sec., 
was therefore selected for analysis. 

Inspection of the yaw data showed 
no trend during the test period either 
from day to day or from trial to trial 
within days. Therefore, the median 
runway time was obtained over all 
test days for each S for each type of 
trial. Separating the Ss given l-gm. 
and }-gm. pellets into Groups K and 
k, respectively, mean times 
obtained (Table 1). 

The table suggests a tendency for 
frustration (F) trials to be faster than 


were 


TABLE 1 


Mean or Mepian ‘Times (Sec.) ty Runway B 
ror Aut Test Trtats Unver Kacu Conprrion 


Treatment 
Group Mean 
pF Pr pf 
K (N11) 
k (N=10) 


2.69 2.80 
2.97 | 3.16 
2.82 
| 


2.97 


Mean 





J. P. SEWARD, A. C. PEREBOOM, B. BUTLER, AND R. B. JONES 


TABLE 2 


Awatysis or Variance or Data 
Aveswacen ix Tasre 1 


Source 


Hetween Se 
Pellet Size (K) 
Se within K (S) 
Within Ss 
Frustration (F) 
Prefeeding (P) 


j 
1.10 917 O1 13.75 | OO1 
3 600 O25) 3.75) 10 


2 
1.92.20 | 2.88) 10 
i 


| 
* Ratio between mean squares for treatment and 
interaction of treatment with Ss within 
** Treatment MS divided by pooled error obtained 


by adding last three sums of squares, giving df 57, 
US 08 


{ trials. More precisely, the differ- 
ence between the combined means for 
F and f was .23 sec. Of the 21 Ss, 
17 had faster median scores for F than 
/ trials, 3 had slower, and 1 had equal 
The likelihood of such a one- 
sided distribution by chance alone is 
given by the binomial expansion as 
002 (two-tailed). These results can 
be compared with Amsel and Roussel’s 
running times. ‘They found a mean 
difference of .6 sec. and a median 
difference of .42 sec. All 18 of their 
Ss had faster medians on frustrated 
than rewarded trials and 16 of them 
had faster means. 

Table | also suggests that prefeed- 
ing (P) trials were slightly slower than 
p trials. The over-all difference be- 
tween P and p was .2 sec.; for 15 Ss 
the ? median was slower than the p, 
for 5 Ss the P median was faster, and 
for | S the two scores were equal. 
On the null hypothesis the probability 
of such a sample is .04 (binomial test, 
two-tailed). 

To test the significance of these 
differences an analysis of variance was 
done, with the results shown in Table 
2. It may be seen that the “frustra- 
tion effect” was significant at the .O1 
level whether tested against its own 


scores. 


error term or the pooled error. Pre- 
feeding in relation to its own error was 
significant at the .025 level, but when 
tested by pooled error it failed to 
reach the .05 level. It should be 
noted that F and P did not interact 
significantly. As to K, the difference 
between I-gm. and }-gm. pellets 
proved insignificant, as did the inter- 
action of this factor with frustration. 

Disregarding for the moment the 
borderline significance of the P effect, 
the P and K dimensions were col- 
lapsed, retaining only the division 
between F and f trials. This was 
done preparatory to drawing the 
curves presented in Fig. 2, designed 
for readier comparison with Amsel 
and Roussel’s results. ‘To plot these 
curves we first computed for each S 
the daily median of four trials during 
training and the F and f medians 
based on two trials a day during the 
test period. Each point on a curve 
represents the median of these 
medians. 

The figure bears out the initial 
impression that after the first few 
days of training nonfrustrated trials 
reached a stable level of runway time 
Confirming Amsel and Roussel (cf. 1, 
Fig. 1), frustration trials showed a 
further dip below that level. Aside 
from the sharp and unexplained rise 
on Test Day 7 and reversal on Day 8, 
all points on the F curve fall below the 
corresponding f points. On the as- 
sumption that F and /f times were 


8.0 





—< ¢ TRIALS 
o--2 F TRIALS 


, 


i\ 
Reef KG 


A 














4 6 6s w « #66 
TRAINING DAYS 


@ 20 22 24 
Test oavé 


"1G. 2. Median times in Runway B. 





AN APPARENT FRUSTRATION EFFECT 


really equal for this sample of Ss, the 
probability of 10 out of 12 differences 
in the direction (two-tailed 
binomial test) is less than .O1. 


same 


Runway A Times on Test Trials 


Originally these times had been of 
incidental interest only. However, 
they proved to be essential to an 
interpretation of the main findings. 
That is, it became important to know 
how prefeeding affected performance, 
not merely in Runway B but in A. 
We therefore determined for each 
S (a) the median time of the two P 
trials and the two p trials on each 
test day, and(b) the median of these 
daily medians separately for P and p. 
We now had two sets of medians 
provided by the same 21 Ss (inspec- 
tion showed no significant difference 
between groups K and &k). The 
means were as follows: P = 5.28 sec., 
p = 4.82 sec. The difference of .46 
sec. gave a ¢ for related samples of 
3.07, with P< 01 (two-tailed). 
Eighteen Ss had differences in the 
same direction. 


Discussion 


Amsel and Roussel trained rats in a 
two-section runway with a pellet in each 
section and found that on trials when the 
first pellet was withheld the rats ran 
faster to the second pellet. The same 
thing was found in the present experi- 
ment, though less strikingly. The ques- 
tion of interpretation remains. Amsel 
and Roussel held that speed was in- 
creased on nonreinforced trials by a 
frustration drive. They considered, but 
rejected, the possibility that speed was 
decreased on reinforced trials by the 
drive-reducing action of the first pellet. 
The present experiment was designed 
to test these alternatives. 

Although the prefeeding effect in 
Tables 1 and 2 was small and of border- 
line significance, it supports the conclu- 
sion that eating a pellet before a trial 


449 


retards running speed. Table 1 contains 
further evidence that speed varies in 

versely with amount of prefeeding. If 
we assign to each column the number of 
pellets consumed, either in PB or GB, 
for that condition, we obtain the follow 

ing: for pF, 0; for PF and pf, 1 each; for 
Pf, 2. If we now rank the mean running 
times at the foot of the table we find a 
close parallel. The differences from one 
column to the next are not significant, but 
the trend is intéresting. 

Since the F effect was more pronounced 
than the P effect, it might be argued 
that the former had two components; 
a retarding effect of the pellet in GB, on 
f trials, and an accelerating effect of 
frustration on F trials. But if one ac- 
cepts prefeeding as a factor, another 
explanation of the stronger F effect is at 
hand: a pellet just eaten in GB, might 
lower speed in Runway B more than a 
pellet in SB 10 to 20 sec. earlier. (It 
may be noted that the insignificant dif- 
ference between PF and pf in Table 1 is 
in the expected direction.) To check 
this possibility a test is needed of the 
immediate effect of prefeeding uncompli- 
cated by frustration. 

Fortunately such a test is available in 
the times for Runway A. It can be 
assumed that no frustration was present, 
since on p trials § was put directly in 
SB, where he had never been fed. As 
reported, running times for P were sig 
nificantly slower than for p, with a larger 
mean difference than in Runway B. It 
therefore seems more economical to ex 
plain the present “F effect” by a slowing 
down on/ trials than to add the hypothe 
sis of an accelerating effect of frustration. 

Why should prefeeding a pellet make 
an animal run more slowly? Amsel and 
Roussel mention hunger reduction. But 
one pellet would hardly alter the blood 
chemistry appreciably, nor did the pres 
ent Ss run slower from trial to trial. 
There is indirect evidence (2) that one. 
pellet may stop hunger contractions of 
the stomach; but in this case the differ 
ence between pf and pr (one pellet vs. 
none) should be greater than between 
Pf and PF (two pellets vs. one), whereas 
no interaction was found. 





450 


The simplest possibility is that an arti- 
fact entered. If on some P trials § 
entered Runway A before he had finished 
the pellet, his chewing and swallowing 
might interfere with running; the same 
could happen on/ trialsin B. Amsel and 
Roussel avoided this difficulty by using 
a 30-sec. delay on rewarded trials. But 
if it were a serious source of error in the 
present experiment, we should have ob- 
tained a stronger F effect than they did 
instead of a weaker one. And it would 
not explain the slight P effect that car- 
ried over to Runway B. 

A third interpretation may be derived 
from the hypothesis that an incomplete 
goal response produces “tertiary” drive. 
It involves the further, but plausible, 
assumption that completion of rg tem- 
porarily reduces D,. Given a hungry 
organism in a food place, it may be as- 
sumed that there is, in addition to hun- 
ger, a drive “to eat.” If food is now 
withheld this motive is enhanced; if a 
small quantity is given the motive is 
weakened, at least for the moment. 

This proposal has an obvious economy. 
It is, however, strictly ad hoc, and its 
economy consists of predicting two ef- 
fects, only one of which has been demon- 
strated. To establish a motivating ef- 
fect of frustration apart from the 
demotivating effect of prefeeding, a dif- 
ferent experimental design is required. 


SUMMARY 


To test the motivating effect of blocking a 
consummatory response a method used by Amsel 
and Roussel was adopted, but included a measure 
of the effect of prefeeding, an uncontrolled factor 
in their design. 

‘Twenty-one rats were given 4 trials a day for 
12 days of training and 12 days of testing in a 
two-part enclosed runway. They were trained 
to run both 9-ft. parts in succession and to eat a 
pellet in a goal box at the end of each section. 


J. P. SEWARD, A. C. PEREBOOM, B. BUTLER, AND R. B. JONES 


In test trials each S was run under four condi- 
tions determined by presence and absence of 
“frustration” and presence and absence of pre- 
feeding combined in a 2 & 2 factorial design. 
Half the trials were normal, i.e., the first goal 
box was baited as usual; the other half were 
“frustrating”’, i.e., the first goal box was empty. 
Before half the trials of each type S ate a pellet. 
Results were as follows: 


1. Running times in the second half of the 
runway were significantly slower on normal 
trials than on frustrating ones, confirming Amsel 
and Roussel. 

2. On prefed trials running times were slower 
than on trials without prefeeding. ‘The differ- 
ence was highly significant for the first part of 
the runway, barely so for the second part. 

The second finding suggested that the first 
could be due to a retarding effect of prefeeding 
rather than a motivating property of frustration. 
Our results were related to a “tertiary drive” 
hypothesis. 


REFERENCES 


. Amser, A., & Rousset, J. Motivational 
properties of frustration: I. Effect on a 
running response of the addition of 
frustration to the motivational complex. 
J. exp. Psychol., 1952, 43, 363-368. 

. Exvurort, M. H., & Treat, W. C. Hunger- 
contractions and rate of conditioning. 
Proc. Nat. Acad. Sci., Washington, D. C., 
1935, 21, 514-516. 

. Matrzman, I. The process need. 
Rev., 1952, 59, 40-48. 

. Rousser, J. S. Frustration effect as a 
function of repeated non-reinforcements 
and as a function of the consistency 
of reinforcement prior to the introduc- 
tion of non-reinforcement. Unpublished 
master’s thesis, ‘Tulane Univer., 1952. 

. Sewarp, J. P. How are motives learned? 
Psychol. Rev., 1953, 60, 99-110. 

. Suerrievo, F. D., Rosy, T. B., & Camper, 
B. A. Drive reduction versus consum- 
matory behavior as determinants of rein- 
forcement. J. comp. physiol. Psychol, 
1954, 47, 349-355. 


(Received October 25, 1956) 


Psychol. 





Journal of Experimental Psychology 
Vol. 54, No. 6, 1957 


RESISTANCE TO 


EXTINCTION AS A FUNCTION OF 


DISCRIMINATION HABIT ESTABLISHED 
DURING FIXED-RATIO REINFORCEMENT! 


M. RAY DENNY, RUTH H. WELLS, AND JACK L. MAATSCH? 


Michigan State University 


In the instrumental bar depression 
situation, Skinner (8) treats the 
“bar-pressing habit’ as but one 
element in a chain of reflexes terminat- 
ing in reinforcement. If “partial” 
reinforcement is involved, then the 
unit of response consists of a chain 
of nonrewarded bar presses which 
terminates with a rewarded bar press; 
and it is this chain, not the individual 
bar press, that acquires increased 
strength. If only every fifth bar 
press were rewarded, for example, 
then each response unit during ac- 
quisition would contain five times 
as many bar presses as the response 
unit (single bar press) of continuous 
reinforcement, and this schedule of 
reinforcement should provide approxi- 
mately five times as many bar presses 
during extinction as the same num- 
ber of!continuous reinforcements (7). 

The purpose of the present study 
is to make a more detailed analysis 
of the effects of partial reinforcement. 
The major hypothesis is that in- 
creased resistance to extinction under 
partial reinforcement is dependent 
upon S’s learning to approach the 
food tray after making the response 
which will be followed by food and 
to avoid the food tray after making a 
response which will not be followed 
by reward. This discrimination, the 
discrimination habit of partial rein- 
forcement, is presumably based on 


! Based upon research by the second author 
in partial fulfillment of the requirements for 
the M.A. degree, Michigan State Univer., 1952. 

2Now at the RAND Corporation, Santa 
Monica, California. 


the discriminative stimuli which are 
present immediately after each to-be- 
rewarded response has been executed. 
In the typical bar-pressing situation 
the distinctive “click” associated with 
the operation of the food magazine 
is probably the main cue. 


The present experiment has employed 
4: 1 fixed-ratio reinforcement. For the 
purpose of explicating the theory, let 
us examine this situation in detail. 
As is customary, S first received a series 
of continuous reinforcements and every 
time the bar was depressed (Rj,) the 
“click” of the feeding mechanism, the 
sight and the sound of the food falling 
into the tray, etc. (S,eue) were present 
just prior to S’s approaching the food 
tray (Rappromh). However, with the 
introduction of 4:1. fixed-ratio rein- 
forcement the Rbar-—S ycue—Rapproach 
Stooa——Reat sequence could only occur 
on every fifth bar press. The four 
times that R), was unrewarded the 
Sicue was absent and the resultant 
stimulus complex can be designated 
S-cuee Since Rapproah tO Scue Was non- 
rewarded it gradually extinguished, while 
Rapproch tO Sycue Continued to be elicited 
on every occasion. In other words, 
Rapprosch did not need to accompany each 
Riar and could be instrumental only 
in the presence of Sjoue. On the other 
hand, every Riae was 
essential and instrumentally 
chained to the terminal or rewarded 
bar press. Thus, there should be in- 
creased resistance to extinction of Ry, 
but not of Ryyproach following partial 
reinforcement. 

Given perfect 
proaching the 
hearing the 
would be 


nonrewarded 
became 


discrimination (ap- 
food tray only after 
“click”’), each Repprosch 
preceded by five Rugr's as 


451 





452 


compared to one Rya in a completely 
continuous reinforcement situation. As- 
sume further that a bar press rewarded 
on every trial and the fifth bar press 
of a 4:1 fixed-ratio block (both con- 
ditions equally often reinforced) would 
both extinguish at the same rate solely 
as a function df the number of non- 
rewarded occurrences during the ex- 
tinction period. It then follows that 
there should be approximately five times 
as many bar presses to reach the extinc- 
tion criterion following 4: 1 fixed ratio 
as following continuous reinforcement. 
Given incomplete discrimination, it is 
necessary to take consideration 
the percentage of Rapproach Which has 
not been discriminatively extinguished, 
since each such Rapproach Tepresents an 
Rior which has not yet been chained 
to the terminal bar press. Let D 
(degree of discrimination obtained under 
fixed-ratio reinforcement) be defined 
A- 
N 
is the number of approaches 
to the made at the end of 
the acquisition period within a fixed- 
ratio block and N is the number of 
nonreinforced bar presses within a block. 
Then the mean number of bar presses 
to extinction for any given ratio of 
reinforcement (Evpg) may be estimated 
from the mean number of bar presses 
to extinction obtained after 100% rein- 


into 


as follows: D 1 where 4 


mean 


food tray 


forcement of the response to an asymp- 
totic level of performance (F490), ac 


the following 
Err E io + dD (E00 x N). 

When D = 0, resistance to extinction 
is the same as for 100°, reinforcement; 
and when D = 1, Eve is equal to Ejoo 
multiplied by the number of bar presses 
in a fixed-ratio block. 


cording to equation: 


Since the mathematized part of the 
preceding analysis was actually made 
possible by the nature of the data 
obtained in the present study, the 
original hypotheses which were tested 
are: (H.1) The discrimination habit 
of partial reinforcement will become 
evident during fixed-ratio acquisition. 


M. R. DENNY, R. H. WELLS, AND J. L. MAATSCH 


By this is meant that the number 
of approach responses to the food 
tray following the fifth, or rewarded, 
bar press will remain unchanged 
while the number of approach re- 
sponses following the nonrewarded 
bar presses (first, second, third, and 
fourth) will diminish. (H.2) If all 
Ss are initially trained under condi- 
tions of continuous reinforcement 
to an asymptotic level of performance, 
The total number of approach re- 
sponses to the food tray during 
extinction, for all amounts of fixed- 
ratio reinforcement, will be a con- 
stant. That is, the number of ap- 
proaches to the food tray will be 
the same as for continuous reinforce- 
ment while the number of bar presses 
will be greater. 


Metuop 


In order to have an independent and objectiv« 
measure of the approach response to the food 
tray, S was placed in a 2-ft. alley with a food 
tray and ‘ 
a horizontal bar at 


‘recording” treadle at one end and 
the other. The alley was 
6 in high, 44 in. wide, lined with sheet metal, 
covered by hardware cloth, and lighted by 
7}-w. bulb 12 in. overhead. The weight of a 
rat readily operated the treadle and activated 
a microswitch. A pressure of 20 gm. on the 


bar activated the feeding mechanism and 
a distinctive auditory 
“click” which, during training, occurred only at 
the time of the release of food. Water 
always available in the box. All bar-pressing 


responses and approaches to the food tray were 


produced stimulus or 


was 


automatically recorded and the administration 
of the food pellets was automatically controlled.’ 

The Ss were 42 naive albino rats, 30 male and 
12 female, 90-120 days old, from the colony 
maintained by the Psychology Department of 
Michigan State University. 

During training all Ss were 
fed 9 gin. of Purina checkers every 24 hr. for 
five days. On Day 5 each S went on a 48-hr. 
food deprivation schedule; on Day 7 S was 
For the first 10 min 
the bar was in place but the food mechanism 
was empty. ‘Two small pellets of food, how 
were in the food tray. 


preliminary 


placed in the apparatus. 


ever, Operant level 


3 We are indebted to Mr. T. H. Maatsch 


for the design and construction of this device 





RESISTANCE TO EXTINCTION 


was measured during this period. Three Ss, 
not included in the 42 above, which made fewer 
than two bar-pressing responses during this 
During the next 10-15 
min. each S after pressing the bar was rewarded 
food tray. If 
scratching noise on the outside of the box was 
made to encourage an approach to the food 
tray. The manually presented 
during this period to insure only one .05-gm. 
pellet per visit to food tray 
of trials 


time were discarded 


for visiting the necessary a 


reward was 


The modal number 
needed to promote spontaneous ap 
proach responses to the food tray was five 
After this 
divided into six groups, two control (C) and 
experimental (EF). C40 (N =9) was 
given 40 continuously reinforced trials and 
then immediately extinguished. C-90 (N = 9) 


received 50 additional continuously 


preliminary training Ss were 


four 


rewarded 
extinguished. The 
control groups permitted a control of drive 
level and reinforcements per se 
and made it possible to determine whether 40 
reinforcements were sufficient to provide an 


trials before being two 


number of 


asymptotic level of bar pressing 
Four experimental groups of six Ss each 


received varying amounts of fixed-ratio rein 


forcement 
warded 


following the 40 continuously re 
trials, and then were immediately 
Groups E-10, E-20, E-50, and 
10, 20, 50, and 80 © fixed-ratio 

respec tively. \ 10. min. no 


extinguished 

E-80 received 
reinforcements, 
response extinction criterion was used for all 
During ex 
tinction the “click” of the empty food mechanism 
occurred on every fifth bar press for all experi 
mental groups and on every bar press for the 


experimental and control groups 


control groups. Each S received the sequence 


of preliminary training, acquisition, and ex 


tinction in one continuous experimental period 
lasting from 14 to 54 hr 


RESULTS 


Figure 1A presents the course of 
the discrimination habit of partial 
reinforcement during fixed-ratio rein- 
forcement for Group E-80. The dis- 
crimination improves with successive 
blocks of fixed-ratio reinforcements 
in a linear fashion. This means that 
there is a progressive elimination 
of the approach response after the 
first four bar presses but not after 
the fifth or rewarded bar press. The 
manner in which the responses of 
approaching the food tray drop out 


453 


within a block during the course of 
partial reinforcement for Groups E-50 
and E-80 can be seen in Fig. 1B. 
Approach following the first and third 
bar-press as depicted here increases 
near the end of acquisition, but this 
is compensated for by a sharp drop 
in approach response following Bar 
Presses 2 and 4. Curves of the 
latter two have been omitted for 
graphical clarity. In Fig. IC are 
plotted the mean numbers of food 
tray approaches after each of the 
five bar presses within a block for both 
acquisition and extinction. Here it 
is evident that the discrimination 
holds up for all of the first four bar 
presses and that the rise in approach 
response near the end of acquisition 
is only temporary. The measures 
in Fig. 1B and 1C were obtained by 
computing the mean number of ap- 
proaches per bar press on every tenth 
block of the acquisition series for 
Groups E-50 and E-80 combined. 
During extinction the measure was 
the same except that it was based 
on alternate blocks. 
Fig. ID the maintenance of 
the discrimination habit throughout 
extinction up to the point when the 
first S met 


The curves in 
show 


the extinction criterion. 


In fact, there is an apparent sharpen- 


ing of the discrimination during 
extinction, as the mean number of 
approaches to the food tray on non 
rewarded bar presses is less than at 
the end of acquisition 1B). 

It should be pointed out that 
Groups E-50 and E-80 made signifi- 
cantly more bar presses during the 


(lig. 


last 10 min. of acquisition than any 
other group (F = 8.0, df = 5 and 36, 
P = Ol) Failure to find such a 
result would have been an argument 
against our interpretation For the 
acquisition of a discrimination habit 
of partial reinforcement implies spend- 
ing a proportionately greater amount 





> 
“WN 
— 


A 


TO TRAY PER REWARO 


MEAN NUMBER OF APPROACHES 








10 20 30 40 50 60 70 80 
BLOCKS OF FIXED RATIO REINF 


3 8 


nw BB ®t 


TO TRAY PER BAR PRESS 
@ 








MEAN NUMBER OF APPROACHES 


Fic. 1. 


M. R. DENNY, R. H. WELLS, AND. J. L. 


\ 2 3 4 5(R) 
SUCCESSIVE BAR-PRESSES IN A BLOCK 


MAATSCH 








- 
BLOCKS OF FIXED RATIO REINF. 


TO TRAY PER BAR PRESS 








MEAN NUMBER OF APPROACHES 


1i3ss 790 1 
EXTINCTION BLOCKS 


The discrimination habit of partial reinforcement: A. The learning curve for this discrim- 


ination in Group E-8O in terms of the number of approaches to the food tray per block, B. The decline 
in food tray approaches during acquisition after the first, third, and fifth presses of a block in Groups in 
E-50 and E-80 combined, &. Approach to tray after each of the five bar presses in a block throughout 
acquisition and extinction in Groups E-50 and E-80 combined, D. Approach to tray after the first, 
third, and fifth bar presses of a block during extinction by Groups E-50 and E-80 combined. 


of time in bar pressing than traversing 
the alley to the food tray. Hypoth- 
esis | appears to be confirmed. 

Figure 2 presents the data per- 
taining to Hypothesis 2. Here the 
mean number of responses to extinc- 
tion for both bar press and approach 
to food tray are plotted against the 
number of blocks of fixed-ratio rein- 
forcement. There is no increase in 
the number of approaches to the 
food tray with increasing blocks of 
fixed-ratio reinforcement (F = .13), 
and this number approximates the 
number of bar presses to extinction 
given by both Groups C-40 and C-90. 
In other words, the approach re- 
sponse, though partially reinforced, 


does not show an increased resistance 
to extinction. The number of bar 
presses to extinction, on the other 
hand, appears to be a’ negatively 
accelerated increasing function of the 
number of fixed-ratio reinforcements 
after response strength under con- 
tinuous reinforcement has reached 
an asymptotic level. 

The theoretical curve based on 
the equation, Evr = Eyoo + D(E yoo 
xX N), is also plotted in Fig. 2. It 
may be noted that the agreement 
between the predicted points and 
obtained points is especially good 
at 50 and 80 reinforcements (E-50 
and E-80). Since degree of discrim- 
ination (D) is based on the number 





RESISTANCE TO EXTINCTION 


of food-tray approaches made during 
the last two fixed-ratio blocks of the 
acquisition period, the measure of 
D for points 10 and 20 (E-10 and 
E-20) includes a greater proportion 
of early acquisition trials than for 
points 50 and 80 and thus underesti- 
mates the degree of discrimination 
attained.‘ As a result, the fact that 
the predicted values at 10 and 20 
fall below the obtained values is not 
too unexpected. 

A summary of the statistical analy- 
sis of the extinction data is presented 
in Table 1. The ¢’s support the 
trends noted in Fig. 2, that is, Groups 
E-20, E-50, and E-80 gave signifi- 
cantly more responses to extinction 
than all groups which received fewer 
fixed-ratio reinforcements except the 
groups directly below them. Hypoth- 
esis 2 is confirmed. 


Discussion 


The net effect of the present analysis 
is to say that rats gradually learn a 
sequence of homogeneous bar-pressing 


rr | 


, Bar -PRESS — 
THEORETICAL GAR -PRESS - 
FOOD TRAY APPROACH” 


MEAN NUMBER OF RESPONSES 


38ses83ass 8 és 








Oo 10 2c 50 BO 


BLOCKS OF FIXED-RATIO REINFORCEMENT 


Fic. 2. The number of responses to extinc 
tion for both bar pressing and approaching food 
as a function of the number of blocks of fixed 
ratio reinforcement. 


« A remedy for this underestimation of D 
rnight be to base D on the first block of extinc- 
tion trials. Unfortunately the original records 
were lost in transit and this computation cannot 


be made. 


TABLE 1 


t Ratios ror Mean NumBer or 
Bar-Presses To Extinction 


Groups bk -80 


C40 + C.90 * | 3.62°*| 4.58°* 
E-10 § {2.21 | 3.01° 
F-20 96 | 1.76 
F-50 | .79 


12.4 
11.2 
| 


*P = OS 
-P = Ol 


responses during a partial reinforcement 
schedule much like a rat learns a sequence 
of heterogeneous turn responses in a 
maze, using whatever cues there are 
available. During continuous reinforce- 
ment such a stimulus as the click of 
the food magazine is no more a cue 
for visiting the food tray than many 
other stimuli attendant upon a bar press. 
But, during partial reinforcement, many 
of these stimulus elements are followed 
by nonreward and approach to these 
cues undergoes extinction. On the other 
hand, the stimuli which are exclusively 
associated with the to-be-rewarded bar 
press become specific cues for food-tray 
approach. Simultaneously, bar pressing 
per se gradually becomes a_ specific 
cue for further bar pressing and, to 
the degree to which these discriminations 
are made, the chain is established. 

The discriminations involved are dif- 
ficult and the rat makes many errors 
in terms of taking trips to the food tray 
too soon, but to the degree to which 
the rat reduces its errors it shows 
increased resistance to extinction. To 
put it in other words, the rat only gets 
nonrewarded during extinction when 
it visits the empty food tray (“‘expects” 
reward). All! groups, it may be recalled, 
visited the food tray the same number 
of times (approx. 30) and, in a true sense, 
received the same number of non- 
rewarded extinction trials. 

One prediction from the equation is 
that the number of responses to extinc- 
tion following partial reinforcement can 
never be greater than the product of the 
(mean) number of responses in a rein- 
forcement block and the mean number 





M. R. DENNY, R. H. WELLS, AND J. L. MAATSCH 


456 


of responses to extinction following 
continuous reinforcement. An examina- 
tion of the available literature on fixed- 
ratio, variable-ratio, periodic, and aperi- 
odic® reinforcement (3, 4, 5, 7, 8, 9) 
reveals that the number of responses 
to extinction is a/ways less than this 
product. Thus the present hypothesis 
remains tenable. 

The present analysis of the effects of 
partial reinforcement fits in well with 
the interference-type theories of extinc- 
tion which are being revived (1, 2, 6). 
These theories stress the analysis of 
stimulus and response variables and 
the importance of discriminative learning 
in all extinction situations regardless 
of whether reinforcement is partial or 
complete. 


SUMMARY 


Two control (C) groups of nine Ss each 
and four experimental (EF) groups of six Ss 
each were run in a Skinner box so designed as 
bar- 
pressing response and approach to the food 
tray. All groups first received 40 continuous 
C40 was immediately ex- 
tinguished, while C-90 received 50 additional 
continuous reinforcements and was then ex 
Groups E-10, E-20, E-50, E-80 
, 20, 50, and 80 blocks of 4:1 
reinforcement, 
to extinction, A “click” accompanied 
rewarded during acquisition 
every fifth bar-press during extinction. 


to obtain independent measures of the 


reinforcements. 


tinguished 
received 10 
fixed-ratio respectively, prior 
each 
bar-press and 
Given asymptotic performance level under 
continuous reinforcement, resistance to ex 
®It should be made clear that the present 
hypothesis applies to periodic, aperiodic, and 
variable-ratio reinforcement as well; fixed-ratio 
reinforcement simply lends itself to a more 
precise experimental analysis. In fact, as is 
consonant with other data (8, 9), the present 
position generates the prediction that aperiodic 
and variable reinforcement will 
about greater resistance 
fixed-ratio 


ratio bring 
than a 
virtual 


response 


to extinction 
schedule because of the 
elimination of any sequential or 
produced cue which may be present during 


fixed-ratio reinforcement. 


tinction of a bar-pressing response was found 
to be a negatively accelerated increasing func- 
tion of the number of blocks ,of fixed-ratio 
reinforcement, while approach to the food tray 
was found to be a constant for all experimental 
groups and was of the same order as the bar- 
pressing level in the control groups. A dis- 
crimination habit of partial reinforcement 
(learning which bar press is followed by reward 
and which are not) was clearly revealed when 
the experimental Ss continued to approach 
the food tray after the fifth or rewarded bar 
press but gradually ceased to make the approach 
response after the first, second, third and fourth 
bar presses. 

The results support a discrimination analysis 
of the effects of partial reinforcement in extinction. 


REFERENCES 


Denny, M. R., & Avetman, H. M. 
tion theory : 


Flicita- 
I. An analysis of two typical 
learning situations. Psychol. Rev., 1955, 
62, 290-296. 
Estes, W. K. Learning. 
chol., 1956, 7, 1-38. 

. Jenkins, W. O., McFann, H., & Crayton, 
I. » & study ot 
extinction following aperiodic and con 
tinuous reinforcement. J. comp. physiol. 
Psychol., V951, 44, 155-167. 

. Jenxins, W. O., & Ricay, M. K. Partial 
(periodic) vs. continuous reinforcement 
in resistance to extinction. J. comp. 
physiol. Psychol., 1950, 43, 30-40 

Jenkins, W. O., & Stanvey, J. C. 
reinforcement: a review and 
Psychol. Bull., 1950, 43, 193-234. 

Maatscnu, J. L. Reinforcement 
tinction phenomena 
61, 111-118. 

Mowrer, O. H., & Jones, H. M. Habit 
strength as a function of the pattern 


J. exp. Psychol., V945, 


Annu. Rev. Psy 


methodological 


Partial 
critique. 


and ex 


Psychol. Rev., 1954, 


of reinforcement. 
35, 293-311. 
Sxinner, B. F. The behavior of organisms. 
New York: Appleton-Century, 1938. 
Skinner, B. F. Some contributions of an 
experimental analysis of 
psychology as a_ whole. 
chologist, 1953, 8, 69-78. 


behavior to 
Amer. Psy 


(Received November 30, 1956) 





Journal of Experimental Psychology 
Vol. 54, No. 6, 1957 


JUDGMENTS OF VISUAL VELOCITY AS A 
FUNCTION OF LENGTH OF OBSERVATION TIME! 


ALVIN G. GOLDSTEIN ? 


University of Missouri 


In an article devoted to a discussion 
of adaptation and after- 
effects as found sense 
modalities, Gibson 


negative 
in several 
states “* a 
prolonged experience of movement 
in any part of the visual field should 
decrease in speed in the direction 
of becoming neutral . (5, p. 
235). He then simple 
but effective of this 
prediction but does not report the 
results detail. To the writer's 
knowledge this interesting phenom- 
of visual not 
further 


been 


describes a 


demonstration 
in 
has 


nor 
literature 


enon movement 


been has 
it 


since 


investigated, 
mentioned the 


the of 


in 
publication Gibson’s 
article. 

This same phenomenon was ob- 
served somewhat accidentally by the 
writer and experiments were initiated 
and completed before the 
article was discovered. ‘Two of these 
The 

ex- 
and 


Gibson 


experiments are reported here. 
effects of two variables were 
plored: duration of viewing time 
stimulus velocity. 


Metuop 


An 


stripes 


Visual display and apparatu endless 


belt of black 


constructed out of heavy, 


vertical and white was 
translucent, white 
Each stripe was 3.8 X 30.5 cm. When 
viewed by S the complete display was rectangu 
1.5 x 45.7 cm., 


zontal \ 


point (1.8 cm. in diameter 


paper 
lar, with the long axis hori 


small illuminated disc, the fixation 


), was centered 5 cm 


' The 


valuable 


author tr 


Lawrence 


wishes acknowledge the 
of kK W illiams 
who participated in all phases of the study 

2 This while the 
author was a staff member of the Army Medical 
Research Laboratory, Psychology Department, 


Fort Knox, Kentucky 


assistance 


research was carried out 


457 


f Mhe 


homo 


the 
pomt 

The 
of the white stripes and the fixation point was 
O45 ft.-L The black 
No other items were 
experiment. This latter 
possible by having S seated in a black cardboard 
booth 


above the upper edge display 


the 
transilluminated 


stripes and fixation were 


geneously brightness 


stripes 


to 


were 
S 


opaque 


visible durin the 


conditi was made 


which enclosed him on three sides and 


\ 
the front panel 


above was cut into 


S to 


mventional electronic 


rectangular opening 
of the booth 
see only the display A « 
timer controlled the length of time the display 


permitting 


was illuminated and visible to S 

The belt was stretched between two vertical 
\ 
the rollers 
mined 


rollers rr speed 


mstant 
the 
by 


motor propelled 


belt) at various predeter 


(and 


velocities means interchangeable 
gears. 
The Ss were required to reproduce the speed 


of the of by 


stylus across an electrical contact 


moving band moving a 
This 
by 
connected 


The 


stripes 
bar 
response translated 
of a Standard Electric 
in series with the contact bar and stylus 
contact bar was 38.3 x 
neath asl t, 42.0 *& , cutinto Jom 

An electrically at both 
of the contact bar made it possible for the stylus 
to be held there the 
The stylus was approximately 

With thi 
S 


was into a time score 


means timer 
mounted 


/ cm., 


* 
fjcm ucite 


insulated section ends 


without activating timer 


the same size 
arrangement 

1 stylus 
the mtact 


vuld be 


and shape as a pencil 
the total time taken by 
the left to the right 
was by the 
Ol ace 


to m 
end 


f 
and 


he 
from ce 


bar recorded timer c 
read to the nearest 

In the usual 
employed studies if 
| standard 


parison stimulus 


general, method prey iously 


in relative movement 


perception involved a and a com 


simultaneous! to 
S or separated by a short time interval (3, 4, 
6) The 
adjusted by 


presente | 


comparison stimulus 
S by E 


was 


ty} ically 
S's 


way 


was 
( direction 


related 


" under 


to a veloc ty which in some 


l¢ 


to the standard velocity g., 
than, The 


velocity reproduction employed 


€ qual to, 
method 
the 


just 
of 
pres nt 
I irst, 
of S's 


was an 


faster etc unique 


in 


study was developed for two 


the 


reasons 
duration 


experimental control of 
tr movement 


accomplished in the standard-comparison method 


» Visual stimu 


exposure 


important consideration is Mot easily 





458 


(unless a method of limits is used which brings 
along many new problems, e.g., long experi- 
mental sessions and expensive movement pro- 
ducing apparatus). Second, the effect of 
duration of exposure to only one moving stimulus 
was the immediate aim of the experiment. 
It is obvious that the standard-comparison 
method does not meet this requirement. Use 
of the arm-movement technique of reproducing 
velocity permits accurate control of duration 
and presentation of an isolated stimulus. In 
addition, this technique is more efficient in the 
present problem than a comparison method. 
For example, apparent velocity following any 
particular observation duration can be obtained 
from a minimum of one trial per S, while this 
same information obtained by the comparison 
method would take at least two trials per S. 

The center of the display was located directly 
in front of S, approximately at eye level, at a 
distance of 274 cm. The plane of movement 
was at right angles to S’s line of sight. As 
viewed by S the direction of movement was 
always from left to right. The contact bar 
assembly was placed at a comfortable height 
in front of the seated S. Its long axis was 
parallel to the plane of movement of the visual 
display. The left or starting end of the con- 
tact bar was slightly to the left of the S’s midline. 

Subjects. —Twenty-four men were employed 
in Exp. Land 5 in Exp. Il. Due to the require- 
ments of the task, all Ss were selected on the 
basis of right-hand preference and visual acuity 
of better than 20/40 for each eye. 

Procedure.—-Unless otherwise noted the pro- 
cedures of Exp. I and II were identical. For 
a period of at least 30 sec. before every trial S 
was light adapted to a brightness approximately 
equal to the brightness level of the white stripes 
of the display. ‘The adapting light, an 8 & 12 
in. opal glass illuminated from the rear, was 
located to the left of S about 5 in. from his 
eyes. The task of S was to fixate the small 
disc above the moving stripes, and then, im- 
mediately after the illumination was extinguished 
and the display was no longer visible, to move 
the stylus over the contact bar at the same 
speed as the stimulus band had been seen to 
move. The S could not see the bar or stylus 
while making his response. 

When S was familiar with the use of the 
stylus and understood the task, at least one 
practice trial was run. Following this, each 
S was presented with three velocities, 2.38, 
4.76 and 14.28 em. per sec. (or approximately 
5°, 1° and 3° per sec., respectively). The 
slowest velocity was well above the lower 
threshold of visual movement; the fastest 
velocity did not produce the appearance of 
blur. For simplicity, in the remainder of this 


ALVIN G. GOLDSTEIN 


paper the stimulus velocities will be referred 
to in terms of the relative velocities; Slow, 
Medium and Fast. Each velocity was observed 
for 2, 15, 30 and 60 sec. making a total of 12 
trials per S. There was a minimum interval 
of 2 min, between trials. The order of present- 
ing the various conditions was derived from a 
replicated 12 XK 12 latin square. 

In Exp. II observation times of 8 and 22 
sec. were included in addition to the four ob- 
servation periods employed in Exp. I. Each 
S was presented with 18 conditions (3 stimulus 
velocities X 6 observation periods) in a random 
order on three consecutive days resulting in a 
total of 54 trials per S. The first day was 
considered to be a practice session; the results 
for Days 2 and 3 were analyzed. 


RESULTS 


The measure used in the analyses 
of both experiments was the time in 
seconds taken by S to move the 
stylus across the full extent of the 
contact bar under each combination 
of viewing time and physical velocity. 
For example, the longer times indicate 
a slower arm movement, which is 
assumed to reflect a decrease in 
perceived velocity (or, possibly, a 
decrease in the duration of the motion 
of a single stripe within the rectangu- 
lar display). 


The data of both 


experiments 
tended to be non-normal in form and 
to exhibit nonhomogeneity of vari- 


ance. For these reasons the results 
were subjected to a nonparametric 
analysis of ranked data. In addition, 
the original data were also trans- 
formed into log values and were 
analyzed by means of a conventional 
analysis of variance. Since the re- 
sults of the analysis of variance were 
not essentially different from the 
nonparametric analysis, only the re- 
sults of the nonparametric treat- 
ment will be presented. 

The data of the two experiments 
offer evidence that there is a sys- 
tematic relation between the duration 
of the exposure time to a moving 
stimulus and its apparent velocity. 





JUDGMENTS OF VISUAL VELOCITY 


For every stimulus velocity the maxi- 
mum (60-sec.) observation time re- 
sulted in a slower mean apparent 
velocity than that obtained in the 
minimum (2-sec.) observation time. 
In most cases the mean apparent 
velocity obtained during any one 
observation time was slower than 
that obtained during an observation 
time of shorter duration. In other 
words, the longer observation times 
in both experiments resulted in slower 
apparent velocity judgments (see 
Fig. 1). 

Figure 1 shows the effect of ob- 
servation time on apparent velocity 
in Exp. I (upper curve) and Exp. II 
(lower curve). Each point on the 
graph is an average of the scores 
obtained in the three velocity con- 
ditions. 

From the upper curve in Fig. 1 
it can be seen that there is a decrease 
in apparent velocity which continues 
up to the 30-sec. observation period. 
Between the 30- and 60-sec. observa- 
tion periods there is no appreciable 
difference in apparent velocity. The 
nonparametric test of the differences 
between rank sums of the four obser- 
vation times was significant beyond 
the .O1 level (x? = 12.60). 

The effect of observation time with- 
in each of the three physical velocities 


CaPERMENT I 
° 


tertamwtnt oO 
ooo | 


3 





Sa See se 4 fps 
2 # 15 22 30 60 
OBSERVATION TIME (SEC) 


° 


MEAN TIME OF ARM MOVEMENT (SEC) 
° 


Fic. 1. Change in time taken to traverse 
contact bar as a function of observation time 
Fach 


in upper curve based on 72 scores; each point 


for combined physical velocities point 


in lower curve based on 30 scores. 


Eeperment | 


tan Tee OF Re WOVEwENT (SEC) 





————— 4 
or s 30 
OBSERVATION Tie (SEC) 


sina , 
60 

Fic. 2 
contact bar as a function of observation time 
for three physical velocities 


Change in time taken to traverse 


Kach point based 
on 24 scores 


is shown in Fig. 2. THe decrement in 
apparent velocity is present in the 
data in each case. In the slowest 
physical velocity condition most of 
the decrement in apparent velocity 
occurred between the 2- and 15-sec. 
observation periods. In the Medium 
and Fast velocity conditions there 


is a more gradual slope between the 
2- and 30-sec. observation periods. 


Each of the curves of Fig. 2 was 
tested separately for significance by a 
nonparametric analysis of ranks. Of 
the three analyses, the differences 
between rank sums of the observation 
times within the Slow Fast 
velocity conditions were significant 
beyond the .05 level (x? = 9.95 ‘and 
12.95). Although the rank differ- 
ences between the observation periods 
in the Medium velocity condition 
did not reach significance (x? = 6.20; 
P = .10), it should be noted that 
apparent velocity decreases from the 
2-sec. observation period to the 60-sec. 
period. 

The data of Exp. II were collected 
on two consecutive days (as reported 
above). A preliminary analysis gave 
no support for the hypothesis that 
the results from these two days were 
significantly different from each other 
Therefore, in the analyses presented 
in this section the data of the two 


and 





460 


days are combined and treated to- 
gether. 

The lower curve of Fig. 1 presents 
the results for combined velocities. 
The highest point on the curve, 
representing the slowest apparent 
velocity, occurs at the 22-sec. ob- 
servation period. The slope of the 
curve from the 2-sec. observation 
period to the 22-sec. observation 
period is steeper than between the 
22- and 60-sec. periods, which sug- 
gests once again that the effect of 
length of observation time reaches a 
plateau in the neighborhood of 30 
sec. of observation time. The non- 
parametric test of the differences 
between rank sums of the six ob- 
servation periods was significant 
beyond the .05 level (x? = 12.65). 

The graphical presentation of the 
results of Exp. II for each physical 
velocity is shown in Fig. 3. For 
each of the three physical velocities 
the curves suggest that the main 
effect of time of observation occurs 


after approximately 8 sec. of observa- 
tion time and that between 2 and 8 
sec. very little of the effect can be 
observed by the method used in this 


study. Analyses of the data com- 
prising each of the three curves re- 
sulted in chi-square values for the 


Experiment I 





MEAN TiwE OF ARM MOVEMENT (SEC) 


4 


60 





OBSERVATION TIME (SEC) 


Fic. 3, Change in time taken to traverse 
contact bar as a function of observation time 
for three physical velocities. Each point based 
on 10 scores. 


ALVIN G. GOLDSTEIN 


Medium and Fast velocity conditions 
which were significant beyond the 
O02 level (14.74 and 14.28 respec- 
tively). Although the ranks of the 
Medium velocity differ significantly, 
it should be pointed out that the 
rank of the 2-sec. period is higher than 
the rank of the 8-sec. period which 
is not in accordance with the results 
obtained for the Slow and Fast 
velocity. This was not readily evi- 
dent in the graphical presentation 
(Fig. 3), since the numerical difference 
between the means of these two con- 
ditions is .O1 sec. (10.96 sec. vs. 10.95 
sec.). The chi square of ranks for 
the Slow velocity did not reach 
significance (P between .10 and 


20). 
Discussion 


The findings of the two experiments 
provide evidence that a decrease of 
apparent velocity of a constant speed 
stimulus occurs as a function of duration 
of observation. Increasing the duration 
of the observation period between 2 and 
60 sec. resulted in small but significant 
differences in the time taken by Ss to 
move a stylus a definite distance in 
response to a moving visual stimulus. 
This response on the part of S is assumed 
to have a monotonic relation to his 
perception of the velocity. If this is 
true, the arm movement velocity” may 
be used as a measure of the S’s percep- 
tion. However, only the differences 
between arm movement scores within 
each physical Velocity condition were 
of interest. No statement can be made 
about the absolute amount of change in 
apparent velocity as a function of 
observation time. To do this, it would 
be necessary to determine the exact 
relation existing between arm movement 
velocity and apparent velocity. 

For many years investigators have 
studied the negative aftereffects of 
seen movement, |.¢., apparent movement 
of physically stationary objects in a 
direction opposite to the rior stimula- 
tion by objective movement under 





JUDGMENTS OF VISUAL VELOCITY 


certain conditions of visual fixation 
(1, 7, 8, 9; 2, pp. 588-607). Gibson, 
in discussing his results which indicated 
that adaptation to visual movement 
occurred, implied that an explanation 
in terms of negative aftereffects could 
be given (5). This argument, in slightly 
modified form, is presented here. 

It is conceivable that the process 
which ultimately shows itself as ap- 
parent aftermovement in the opposite 
direction from the inducing objective 
movement may be developing during 
the period of observation of the objective 
motion. If this were the case, then 
in the S who is exposed to visual move- 
ment there would be a developing process 
“moving” opposite to the movement of 
the visually presentedstimulus. It would 
be plausible to assume that this opposing 
process could give rise to a reduction 
in the apparent velocity of the observed 
movement. In other words, if visual 
movement aftereffects and the reported 
decrements in apparent velocity are 
intimately related, then it would be 
expected, in an ideal situation, that 
aftereffects would not be elicited unless 


there was a sufficient period of observa- 
tion to produce a decrement in apparent 
velocity. 


SUMMARY 


of § 
as an indicator of perceived velocity two ex 
periments were conducted to determine whether 
exposure to moving visual stimuli for various 
duration times would result in different apparent 
velocities. In Exp. I physical velocities of 
approximately 2.4, 4.8 or 14.3'cm. per sec. 
were viewed for 2, 15, 30 and @ sec. Experi 
ment II was similar to Exp. I except that 
additional observation periods of 8 and 22 sec. 
were included. 


Using an arm movement response 


461 


In both experiments it was found that in 
creasing the duration of exposure to a constant 
velocity visual stimulus 
of apparent velocity. In general, there was 
a decrease in apparent velocity with an increase 
in observation time: from 2 to 8 sec. 


resulted in changes 


of observa 
tion little change occurred; from 8 to approxi 
mately 30 sec. of observation apparent velocity 
decreased, and from 30 to @ sec. of observation 
there was no change or a slight tendency for 
apparent velocity to increase but not to the 
level obtained under 2 sec. of observation 
These effects were present to some degree under 
the three stimulus velocities used in the experi 


ments. 
REFERENCES 


1. Appams, R. An account of a peculiar optical 
phaenomenon seen after having looked 
at a moving body. Phil. Mag., 1834, 
3 ser., 5, 373-374. 

Bouine, E. G. Sensation and perception in 
the history of experimental psychology 
New York: D. Appleton-Century, 1942 

Brown, J. F. The 
velocity. Psychol. 
199-232. 

Exman, G., & Dantudcx, B. A subjective 
scale of velocity. Rep. Psychol. Lab 
Univer. Stockholm, 31, Feb. 1956 

Gibson, J. J. Adaptation with 
after-effect. Psychol. Rev., 
222-224. 

. Gisson, J. J., & Smirrn, O. W. The percep 
tion of motion in space. In Prov 
Sympos. Physiol. Psychol., ONR Sympos 
Rep. ACR-1, 1955, Pp. 117-124 

. Jonansson, G. Studies on motion-after 
effects I Rep. Psychol. Lab 
Stoc bholm, 1955, No. 4 

. Jonansson, G. Studies on 
effects I]. Rep. Psychol 
Stockholm, 1955, No. 13. 

. Wonrcemutn, A. On the after-effect of 
seen movement. Brit. J. Psychol 
Monogr. Supft., 1911, No. 1, 1-117 


visual perception of 


Forsch., 931, 14, 


negative 


1937, 44, 


Univer 


motion-after 
Lab. Univer 


(Received November 30, 1956) 





Journal of Experimental Psychology 
Vol. 54, No. 6, 1957 


EFFECT OF AMOUNT OF INTERPOLATED LEARNING AND 
TIME INTERVAL BEFORE TEST ON 
RETENTION IN RATS! 


JUDITH P. FRANKMANN 


Indiana University 


Studies of retroactive inhibition and 
retention have used relatively com- 
plex experimental situations compared 
to those dealing with simple acquisi- 
tion, extinction, and recovery phe- 
nomena. In order to determine 
whether a single theory can handle 
both retention and these other phe- 
nomena, a simple situation must be 
arranged in which the stimuli and 
responses can be identified through- 
out the experiment. Up to the 
present time with few exceptions 
both the stimuli and the reinforce- 
ment contingencies of the? original 
learning (OL) situation are different 
from those of interpolated learning 
(IL). This is found in work with 
rats as well as with human Ss. For 
example, in two of the more recent 
studies by Waters and Vitale (7) and 
Marx (6) 14-unit T mazes were used 
during OL and different mazes were 
introduced for the interpolated ses- 
sions. In order to avoid such com- 
plications, in the present experiment 
a simple T maze was used. The 
original response was that of going 
to one side of the maze and the inter- 
polated response was a reversal to the 
other side. In this way the stimulus 
situation was held constant from 
series to while reinforcement 
contingencies were varied, 

Predictions from statistical learning 
theory led to the choice of two in- 
dependent variables, number of IL 


series 


!'This article is based on a thesis submitted 
in partial fullfillment of the requirements for the 
degree Master of Arts. The writer is indebted 
to Dr. W. K. Estes for guidance in planning 
the research. 


trials and duration of rest interval 
before relearning the original response. 
Before developing these predictions 
the major assumptions underlying 
the theory will be reviewed. De- 
tailed discussions have been reported 
earlier by Estes and Burke (3) 
and Estes (2). 


The stimulating situation to which S is 
exposed is considered to be represented by a set 
of stimulus elements which depend on many 
environmental components. Within an ex- 
perimental period a subset of these elements 
is available to S. On each trial S samples the 
subset and all of the elements sampled become 
conditioned to the response that is reinforced. 
The behavior that can occur is divided into 
mutually exclusive and exhaustive response 
classes, and each element can be conditioned 
to one and only one response at any given time. 
Under these conditions the probability of a 
particular response is given by the proportion 
of elements in the available subset conditioned 
to that response. 

It is further assumed that the environmental 
components are in a constant state of random 
fluctuation. Between experimental periods there 
will be an interchange of elements from the 
available subset and previously not 
available. As the fluctuation process con- 
tinues the proportions of conditioned elements 
in the two subsets will approach equality 


those 


All of the elements sampled during 
OL of the T maze will have become 
conditioned to the original response. 
On each IL trial the stimulus sample 


will become conditioned to the re- 
versal response. As the series con- 
tinues, an increasing number of ele- 
ments previously conditioned to the 
original response will become 
ditioned to the reversal response 
and the probability of the former will 
decrease. Estes and Burke (3),and 
Estes and Straughan (4) have shown 


con- 


462 





RETENTION IN RATS 


that the curve of learning of the 
reversal response is a _ negatively 
accelerated exponential function. 
Therefore, the amount of retention 
should be a negatively accelerated 
decreasing function of the number 
of IL trials when measured in terms 
of the probability of the original 
response at the beginning of the 
retention test. 

In order to predict the effect of 
the length of the rest interval between 
IL and the test for retention, one 
must first look at the elements con- 
ditioned to the reversal response at 
the end of the IL The 
subset was available during 
this training must contain a higher 
proportion of these conditioned ele- 
ments than the subset that was not 
available, but as a result of random 
interchanges during the rest interval, 
the proportions in the available and 
unavailable subsets will tend toward 
equality. Estes (2) has shown that 
spontaneous recovery in probability 
of the original 
negatively 
time. 


session. 
which 


will be a 
function of 
Thus, the amount of reten- 
tion, measured in terms of probability 
of the original response at the start 
of the test period, should be a nega- 


response 
accelerated 


tively accelerated increasing function 
of the duration of the rest interval. 


Two values for each independent 
variable were chosen. 


The numbers 
of IL trials were selected on the basis 
of the Lauer and Fstes study (5) 
of learning in a T maze with a correc- 
tion procedure. 
spontaneous 


An experiment on 
recovery of the bar- 
pressing response by Ellson (1) sug- 
gested the the 
intervals. 


durations of rest 


MetTuop 


The Ss were 40 female hooded rats 
from the Indiana University colony 


Subjec ts 
The ages 
ranged between 5 and 7 mo. at the beginning 


of the experiment. During the course of the 


463 
experiment 4 Ss discarded 
failure of the timing mechanism 

A pparatus.—The apparatus was a 
black, single-unit T maze covered by Plexiglas 
sections 


were because o 


main 
The maze was 54 in. high with the 
start box and arms 4 in. wide and the stem 24 
in. wide. ‘The from the wall 
of the start box up to the choice point was 20} 
in. The length of the arms from the door of 
one goal box to the door of the other was 304 in 
Each goal box was an additional 15} in. long 
A guillotine door separated the start box from 
the stem 


distance rear 


f the maze. One-way swinging doors 
of gray cardboard permitted entrance into the 
goal Photoelectric cell units were 
mounted 44 in. beyond the choice point in each 
arm and also in each goal box 24 in. from the 
entrance. The photoelectric controlled 
Ol-sec. Standard Electric timers. Fach 
unit started one both 
the “correct 


held the food 


boxes 


cells 
two 


arm 


timer and timers 


were stopped when S ” 
goal box Brass 


pellets in the goal boxes 


entered 
feeding cups 
In addition, two retaining boxes were used 
one for holding Ss during the intertrial interval 
and another for special feeding purposes 
Design.—On Day 1 all Ss had 12 trials on a 
left-right problem. Half of 
each experimental group was trained to the 
right and half to the left. On Day 2 each of the 
four groups was to either 4 or 12 
trials of IL. Finally, each of these groups was 
tested for retention of the original response either 
5 min. or 24 hr 
Procedure 


discrimination 


subjected 


after the completion of IL 
The Ss were randomly divided 
into four groups of lOeach. A random running 
order was determined which held throughout 
the experiment. ‘The Ss were assigned to home 
cages in groups of four according to running 
order so that the 23-hr. feeding schedule could 
be maintained as constant as possible. 

Animals were handled daily for a month and 
started on a 23-hr. food deprivation 
3 wk. before Day 1 of the experiment 
this time each 


schedule 

During 
45-mem 
food pellets from the food cups later used in 
the maze 

Over a period of three days each S had 14 
forced-choice trials without 
training on passing through the swinging doors 
For each S half of the trials were to the right 
and half to the left side of the maze 


S was trained to eat 


reinforcement as 


In order to equate the amount of food in- 
gested at the beginning of the test period during 
the experiment proper, Ss in the 24-hr. interval 
groups were given 12 pellets in the special 
feeding box 5 min 
The Ss with the 
polated 


before the session began 
5 min. interval and 4 inter 
rec ceived & additional pellets 
in the feeding box at the end of the IL session 


trials 





JUDITH P. FRANKMANN 





Tie (SEC) 





MEAN RECIPROCAL RUNS ND 





— — = * ee T 
rrTcurrrrset et & 
rest TRIALS 


Fic. 1. Mean reciprocal running time during 
the test series with respect to number of inter- 
polated trials and time interval before retention 
test. 


The Ss with the 5 min. interval and 12 IL 
trials were placed in the box for a comparable 
time but had no pellets. 

The intertrial interval was 30 sec. during all 
experimental periods. A correction procedure 
held throughout the experiment. An incorrect 
response was scored when S touched the door 
of the incorrect goal box with his nose or foot 
Two response measures were used: (a) per- 
centage of correct responses, and (b) the re- 
ciprocal of running time. It should be noted 
that the running time measure was not the 
usual one taken from the start box to the goal 
box. ‘The usual measure would have a high 
correlation with probability of correct response 
during the original acquisition series but not 
on the reversal series. ‘The measure used here 
was chosen so that it would correlate highly 
on all series with probability of correct response, 
the measure upon which the model is based. 
The times were measured from the 
S first entered the correct arm on a given trial 
until he entered the correct goal box. It was 
assumed that the reciprocal running time meas- 


moment 


ure would vary directly with probability of 
the correct response. 


RESULTS AND Discussion 


The results on the relearning (RL) 
trials for the 4- and 12-trial IL groups 
are shown in Fig. land2. Each point 
represents the mean of 18 Ss. The 
difference between the two groups is 
decidedly more striking with the 
reciprocal running time measure in 
Fig. 1 than with the percentage 
of correct responses measure in Fig. 
2. This is not surprising, however, 
in view of the greater sensitivity 


of the former measure. The im- 


portant point is that, in both cases, 
Ss with 12 IL trials show less reten- 
tion, i.e., lower scoresfon early RL 


O--0 24% GROUFS 


oe sy GROUFS 


(R,) 





% CORRECT RESPONSE 


oO 41mm GROUTtS 


ee 2 es Hove 








TRIALS 


Fic. 2. Percentage of correct response per 
trial during the test series with respect to num- 
ber of interpolated learning trials and time 
interval before retention test. 





RETENTION IN RATS 


a 


trials, than those with only 4 inter- 
polated trials. Since the four sub- 
groups were found to differ consider- 
ably during OL, analysis of covariance 
was used in all comparisons of per- 
formance on the RL trials. The 
control measure was the mean for 
each S on the 12 OL trials. Separate 
analyses were made for the first 
and second halves of the RL series. 
With respect to reciprocal running 
time, the difference between the 
4 and 12 interpolated trial groups on 
the first six trials is significant at 
the .O1 level, as shown in Table 1. 
During the last six trials the difference 
is no longer significant. Means and 
SD’s for OL and the first half. of 
the RL trials are tabulated with 
respect to both response measures 
in Table 2. 

Comparisons between the 5-min. 
Ss and those with a 24-hr. interval 
are presented in Fig. 1 and 2. It 
is clear, particularly with the running 
speed measure that the 5-min. group 
showed less retention, i.e., lower 
scores on early test trials. The 
difference is significant at the .01 
level for the first half of the test series 
(see Table 1), and at the .05 level 
for the second half. The differences 
again are not significant for the per- 


465 


TABLE 1 


Anatysis or Covariance or Reciprocal 
Runwninc Time MEASURES FOR 
RL Triats 1-6 


| 


Source | 
| 


Number of trials 
Time interval 
Trials X Time 
Within groups 
Total 


*P = O, 


centage of correct responses. Thus 
predictions concerning number of IL 
trials and time interval before test 
are confirmed statistically in the 
case of running speed. Considering 
the frequency measure, group dif- 
ferences are in the expected direction. 


SUMMARY 


Statistical learning theory predicts that 
retention of a response will be (a) inversely 
related to the number of IL trials and (b) 
directly related to the duration of the time in- 
terval between IL and retention test. 

Forty female hooded rats were trained on a 
left-right discrimination in a T maze. The 
interpolated response was a reversal to the 
opposite side from that of original learning 
Two values of IL, 4 and 12 trials, were studied 
in factorial combination with two time intervals 
between IL and retention test, 5 min. and 24 


TABLE 2 


Means anv SD’s ror tHe 12 OL Triats ann RL Trias 1-6 


Total Correct Responses | 


IL | RL 


Group OL 


| Trials Time 


Mean 


x44 
x 44 
8.56 
8.22 


5 min. 

5 min. 
| 24 hr. 
24 hr. 


| 1.26 
| 164 
1.26 
| 1.40 
Note. 
for Group IIL. 


Mean Reciprocal Running Time 


RL | OL 


Mean SD 


1.02 
0.92 
1.22 
1.09 


0.22 
0.24 
O.18 
0.23 


0.18 
0.24 
0.29 
0.21 


The mean correct responses during 4 IL trials was 1.11 (SD = 1.28) for Group I and .78 (SD 91) 
The comparable mean reciprocal running times were 95 (SD = .22) and 1.10 (SD = 14). The 


mean correct responses during 12 IL trials was 7.78 (SD = 1.12) for Group II and 7.89 (SD = 1.59) for Group IV 


The comparable mean reciprocal running times were 1.26 (SD «= 


14) and 1.37 (SD = .24) 





466 JUDITH P, FRANKMANN 


hr. All Ss had 12 OL trials and 12 RL trials. 

Retention of the original response was 
evaluated by means of a running speed measure 
and a frequency measure taken over the 12 
RL trials. With the speed measure, group 
comparisons on both variables yielded significant 
differences in the predicted directions. With 
the frequency measure, all group differences 
were again in the predicted directions, but none 
were significant at the .05 level. 


REFERENCES 


1. Exirson, D. G. Quantitative studies of 
the interaction of simple habits: I. 
Recovery from specific and generalized 
effects of extinction. J. exp. Psychol., 
1938, 23, 339-358. 

2. Estes, W. K. Statistical theory of spon- 
taneous recovery and regression. Psychol. 


Rev., 1955, 62, 145-154. 


3. Estes, W. K., & Burxe, C. J. A theory of 
stimulus variability in learning. Psychol. 
Rev., 1953, 60, 276-286. 

. Estes, W. K., & Srraucuan, J. H. An- 
alysis of a verbal conditioning situation 
in terms of statistical learning theory. 
J. exp. Psychol., 1954, 47, 225-234. 

. Laver, D. W., & Estes, W. K. Rate of 
learning successive discrimination rever- 
sals in relation to trial spacing. 
Psychologist, 1953, 8, 384. (Abstract) 

. Marx, M. H. The effects of cumulative 
training upon retroactive inhibition and 
transfer. Comp. Psychol. Monogr., 1944, 
18, 62 pp. 

. Warers, R. H., & Virate, A. G. Degree 
of interpolated learning and retroactive 
inhibition in maze learning. I. Animal 
subjects. J comp. Psychol. 1945, 38, 
119-126. 


Amer. 


(Received December 7, 1956) 





Journal of Experimental Psychology 
Vol. 54, No. 6, 1957 


CLASSICAL CONDITIONING OF HUMAN 
PUPILLARY DILATION! 


ARNOLD A. GERALL,? PHILIP B. SAMPSON? AND GERTRUDE L 


BOSLOV 


University of Rochester 


The equivocal nature of the results 
of classical conditioning of the human 
pupillary dilation response is gener- 
ally well known. Watson (16), Cason 
(2), Baker (1) have reported 
successful conditioning of pupillary 
dilation. Careful repetitions of most 
of Cason’s work by Wedell, Taylor, 
and Skolnick (17) and of Baker’s 
studies by Hilgard, Miller, and Ohlson 
(7), however, have failed to substanti- 
ate these reports of pupillary response 
conditioning. Efforts to find stable 
conditioned pupillary constriction re- 
Sponses, likewise, have not met with 
success (6, 14, 15, 18, 19). In view 
of the weight of persistent failures 
to obtain conditioning, Hilgard, Miller, 
and Ohlson remarked: “Unless satis- 
factory pupillary conditioning can 
be obtained, classical conditioning 
theory will have to be revised. It 
is important that the negative results 
should not be allowed to stand until 
every effort has been made to dis- 
cover more favorable conditioning” 
(7, p. 689). The present study is 
another attempt to identify the con- 
ditions sufficient for the modification 
of pupillary dilation by the classical 
conditioning procedure. 

On the basis of empirical and theo- 
retical considerations, Spence (13) 
has suggested that the nature of the 
UCS used to evoke the dilation re- 
sponse should be subjected to in- 
vestigation. He that Girden 
(3), Harlow (4), Harlow and 


and 


noted 
and 


! This research was supported by the National 
Foundation under Grant NSF-G1791. 
2 Now at the University of Kansas. 

+ Now at Tufts University. 


Science 


467 


Stagner (5) have reported studies in 
which the dilation response was 
conditioned in animal Ss. In these 
three studies, electric shock was used 
as the UCS. In the studies in which 
no conditioning was found, the UCS 
was change in brightness of a 
light source. A UCS involving shock, 
therefore, might have some properties 
relevant to the establishment of 
conditioned pupillary dilation that 
are not present in a UCS consisting of 
change in illumination. 

One important property of shock, 
its reinforcement characteristic, has 
been emphasized in a 
controversy that has 
in recent years over the variables 
necessary for the modification of 
responses innervated by the autonomic 
nervous system. Mowrer (10, 11) 
has formulated a two-factor theory 
of learning which states that rein- 
forcement does not affect the condi- 
tioning of smooth muscle responses ; 


a 


theoretical 


been waged 


the only necessary condition being 
the contiguity between the CS and 
UCS. Reinforcement necessary, 
according to Mowrer, only for learn- 
ing involving skeletal muscles. Am 
earlier suggestion of this view was 
made by Skinner (12). Whereas 
monistic reinforcement theorists such 
as Hull (8) and Miller (9) hypothesize 
that reinforcing stimulus should 
affect the acquisition of smooth and 


18 


a 


skeletal muscles in a similar manner. 
The pupillary response is governed 
by the autonomic nervous system and 
to Mowrer’s two-factor 
theory, conditioning would be pre- 
dicted to occur in all of the experi- 


according 





468 


ments mentioned above, since the 
necessary condition for modification, 
contiguity between the CS and UCS, 
was present in all of these studies. 
Reinforcement theorists, as Spence 
has noted, would not predict the 
establishment of CR’s in the studies 
with human Ss using a change in light 
intensity as the UCS. A change in 
light intensity, unless the intensity is 
very high, is not considered to have 
intrinsic reinforcement properties. 
The cessation of shock has been in- 
cluded in the class of stimulus events 
called reinforcers and if it were given 
as part of the UCS pattern, then they 
would predict the occurrence of pupil- 
lary conditioning in human as well as 
animal Ss. 


Metuop 


Apparatus.—A Ciné-Kodak Special camera 
with Kodak High Speed Infrared film was used 
to record changes in pupil diameter during 
the course of the experiment. ‘The speed of 














Fic 1. Diagram of apparatus used to 
photograph the pupillary response of human 
Ss: Stimulus lights that were used as part of 
the UCS pattern, SL; Infrared light sources, 
IR; Motion picture camera, C, 


A. A. GERALL, P. B. SAMPSON, AND G. L. BOSLOV 


the camera was 8 frames per sec. An f 2.7, 
2.5-in. telephoto lens projected through the 
back panel of a rectangular box as shown in 
Fig. 1. The S sat on a chair in front of the 
box with his face flush against a cushioned 
port labeled “P” in Fig. 1. Padded chin and 
head bars were used to support and steady 
S’s head. The chin rests were adjusted so that 
S’s right eye was directly in front of the camera 
lens. The distance between the eye and front 
surface of the lens was 12 in. 

Light sources for the UCS and infrared 
illumination were located inside the panel. 
In this study, the UCS was obtained by turning 
off a 15-w. bulb in an aluminum reflector located 
6 in. in front of and 3 in. down from S’s nose. 
There were two light sources, both indicated 
as SL in Fig. 1, but only the lower one was used. 
"Two 25-w. bulbs enclosed in aluminum coated 
boxes with Wratten 87 filters provided infrared 
light. ‘These lamps, labeled IR in the diagram, 
were located at a 45° angle on a plane perpen- 
dicular to the horizontal plane of S’s eyes. They 
were 7 in. away from the eye that they illumi- 
nated. 

Three Hunter Decade Electronic ‘Timers 
controlled the durations of the CS, UCS, and 
photographic recording. A panel of switches 
which enabled EF to control the presentation 
of the stimuli and the timers were located on a 
separate table from the one on which the other 
equipment rested, 

Subjects.—A total of 50 Ss participated in 
the study. The data from four Ss were not 


used because of technical difficulties in processing 
of the film during the early phase of the study 
or because of errors made during the experi- 


mental period. Of the 46 remaining Ss, 33 
were males and 13 were females. They were 
students or staff members at the University 
of Rochester. 

Experimental stimuli.—The CS consisted of 
the cessation of thermal noise and a 1000- 
cycle/sec. tone approximately 50 db. above 
threshald, generated by a commercial oscillator. 
Both stimuli were delivered through earphones 
in an aviator’s helmet which S wore. Except 
during rest periods and when the CS was 
presented, thermal noise, approximately 70 db. 
above threshold, was presented through the 
earphones. ‘The thermal noise was loud enough 
to mask all of the sounds from the timers, 
camera, and switches. 

The UCS was either light offset, electric 
shock, or a combination of light offset and 
shock. The lamp whose offset served as the 
UCS was approximately 30 ft.-candles and 
when it was turned off during the UCS period, 
a fraction of a foot-candle of light was available 





CONDITIONING OF PUPILLARY DILATION 


to S. Since this lamp contained a tungsten 
filament, neither its offset nor its onset was 
instantaneous. The shock was a 60-cycle/sec. 
AC voltage from two 6-v. filament transformers 
placed back to back. Inserted between the 
line and the transformers was a variac which 
determined the magnitude of the voltage. 
Silver electrodes with EKG electrode paste 
rubbed on them were placed on the forearm 
of S. The voltage level was established for 
each S by increasing the voltage in 2-v. steps 
until the shock was reported to be uncomfortable 
and something they would avoid. The voltage 
ranged from 10 to 40 v. 

A delayed conditioning procedure with the 
CS and UCS terminating together was used. 
The duration of the CS was 3.0 sec., the CS- 
UCS interval 1.5 sec., and the UCS 1.5 sec. 
The camera was operating during the 3.0-sec. 
CS period. 

‘General procedure.—The general experimental 
procedure was the same for all Ss. There 
were two experimental sessions for each S, 
usually on consecutive days but sometimes 
separated by one day. On Day 1, S was given 
instructions and the equipment was adjusted 
for photography, but no photographs were 
taken. The instructions stated that the re- 
sponse of the eye to various stimuli was being 
studied. The S was told that his eye was 
being photographed, and that he was to keep 
his head in one position and to fixate on the 
camera lens. The tone and light offset were 
presented and the importance of taking an 
uncomfortable shock was stressed. No con- 
ditioning trials were given on this day. ‘The 
order of presentation of stimuli was as follows: 
T (tone alone), T, T, shock level adjusted, S 
(shock alone), S, S, T, T, T, T, T, T, 8, T, T. 
Two rest periods of approximately 2 min 
duration were interspersed between predeter- 
mined trials during the experimental period. 
This session lasted 30 min 

On the second day the order of presenting 
the experimental stimuli was: T, T, T, shock 
level adjusted again, T, T, T, T, T, C (condi- 
tioning), C, C, C, C, TT (test trial), C, C, 
Ts, Se Oe Ce GG Ore, & GS Bee 
(extinction), E, FE, FE, E, E, E, E, E, E, for 
Groups I, II, and IV. 
control groups, Groups III and V, the order was 
the same as the other groups but instead of 
conditioning trials in which the CS was paired 
with the UCS, the UCS was given alone. The 
1.5 min. Also, 


min 


For the sensitization 


mean intertrial interval was 


seven rest periods of approximately 1.5 
were distributed between the same trials for 


all groups throughout the session. During a 


469 


rest period, the thermal noise was turned off 
and Ss moved away from the panel. 

Design.—The nature and presentation of 
the UCS was the only differential treatment 
imposed upon the five groups of Ss. The 
UCS condition provided each Group was as 
follows: Group I, Shock and Light Offset; 
Group II, Light Offset; Group III, Shock and 
Light Offset but never paired with the CS; 
Group IV, Shock; Group V, Shock but never 
paired with the CS. Groups III and V were 
sensitization controls for determining if the 
response to the tone, if it occurred, was due 
to the pairing of the CS with the UCS or to 
other factors, such as heightened emotionality 
or startle tendencies, etc. 

When the study was initiated, only the 
first three groups were incorporated in the 
design. The Ss randomly to 
them. Before half of these Ss, were run, it was 
decided to include Groups IV and V. The 
remaining Ss and the additional Ss were then 
assigned randomly to the five experimental 
conditions. At the termination of the ex- 
periment, there were 14 Ss in Group I,7in Groups 
II and III, 12 in Group IV, and 6 in Group V. 

The diameter of the pupils was measured 
on a microfilm reader that magnified the image 
on the 16-mm. film 27 times. This value was 
converted back to actual size by a factor which 
was obtained by placing a millimeter rule parallel 
to the vertical plane of the iris of several Ss and 
photographing it. Since this calibration was 
obtained for a few Ss, there might be some error 
in using a single conversion factor due to vari- 
ability in facial This error, 
however, would be unsystematic and small. 


were assigned 


construction. 


RESULTS 


Since the camera was 8 


speed 
frames per sec., 24 frames or in- 
dividual pictures of the pupil were 
taken during a stimulus presentation 


The first half of the 24 
frames was exposed during the period 
that the CS was presented alone. 
The second half of the 24 frames 
was taken during the interval when 
the CS and UCS were paired. Dur- 
ing the adaptation, test and extinction 
trials, the entire 24 frames were 
exposed when the CS was presented 
alone. The experiment was con- 
cerned primarily with the average 
change in pupil size evoked by the 


period. 





A. A. GERALL, P. B. SAMPSON, AND G. L. BOSLOV 





ANTICIPATORY PUPILLARY RESPONSE 





Ss eeerre 


Secrets 


| 
’ 





Lt it 





[6 7 8/9 10 I2 B 4715 6 i771 19 21 22 23724 26 27)/26729 30 31 32 33 38) 





ADAPTATION 


~- 


ACQUISITION 


EXTINCTION 


- TRIALS - 


Fic. 2. 
of trials during the three experimental phases. 
trials that are omitted on the abscissa. 
indicates a test trial. 


CS in the various groups during the 


course of the procedure. Two meas- 
ures were used in the analyses of this 
change. First, a measure of an- 
ticipatory pupillary responses was 
obtained by computing the average 
difference between the diameter of 
the pupil during the first three frames 
immediately following the onset of 
the CS and the last three frames 
before the onset of the UCS. This 
difference score represents the change 
in pupil diameter before the time 
the UCS was scheduled to be pre- 
sented and, therefore, could be de- 
termined for every trial during the 
experiment. Secondly, an index of 
maximum pupil change was obtained 
during those trials when the CS was 


Mean anticipatory dilation response of each of the five groups as a function of the number 
Photographs of the response were not taken on 
The subscript “T” following numbers oa the abscissa 


presented alone. It was computed 
by subtracting the average diameter 
of the pupil on the first 3 frames after 
the onset of the CS from the average 
of the 3 largest pupil diameters in 
succession among the entire 24 frames. 
Both of these measures are equal to 
zero if no change in pupillary diam- 
eter is evoked by the CS and are 
equal to a score larger than zero if 
the pupil dilates. 

The anticipatory and maximum 
pupillary response changes for the 
five groups are depicted in Fig. 2 
and 3, respectively. Analyses of vari- 
ance performed on the data obiained 
during the last three trials of the 
adaptation period indicated that nei- 
ther the anticipatory nor maximum 





CONDITIONING OF PUPILLARY DILATION 


pupillary responses of the five groups 
were different from each other. The 
F ratio was 1.87 (df = 4 and 41) 
for the anticipatory response measure 
and .7 (df = 4 and 41) for the maxi- 
mum response measure. There was 
a significant decrease in both scores 
over the last three adaptation trials 
which reflects the diminution of 
the general startle reaction evoked 
by the CS. It will be recalled that 
each S had received a series of shocks 
shortly before this adaptation period 
started. 

During the acquisition phase, both 
anticipatory and maximum responses 
were analyzed and only the variability 
attributed to Groups was found to be 
statistically significant. The F ratios 
(df = 4 and 41) were 23.38 and 16.94 


for the anticipatory and maximum 





MAXIMUM PUPILLARY RESPONSE 
- millimeters - 





471 


responses, respectively. On the basis 
of the change in the anticipatory 
response during Trials 9 through 13 
for Groups I and IV, an interaction 
between Groups and Trials might 
be expected. This interaction, how- 
ever, was not found to be significant 
in analyses of variance including the 
five groups, probably because it 
represents only a small part of the 
data which essentially have a slope 
of zero. The acquisition functions 
in Fig. 2 and 3 indicate that Groups 
I and IV differ from the other groups. 
Individual ¢ performed 
on all comparisons of each group 


tests were 
with the others during the acquisition 
period. The null hypothesis was 
rejected in this (P < .05). 
The results of these tests show, as 
an inspection of the figures suggests, 


study 





\4, Ty 





23, _| [20,20 30 a2 53 





ACQUISITION 


ExTimcTion 


~ TRIALS - 


Mean maximum dilation response foreach of the five groups 
for trials on which the UCS was not presented 





472 


that Groups I and IV were not 
different from each other but each 
of them was statistically different 
from Groups II, III and V. Thus, 
only the two groups that had a 
UCS containing shock paired with 
the CS showed conditioned pupillary 
responses during acquisition. The 
two sensitization groups and the 
light offset alone group did not 
manifest any modification of the 
pupillary response. 

During the extinction period, as in 
the acquisition phase, Groups I and 
IV were not statistically different 
from each other. However, they were 
statistically different from each of 
the other three groups. An analysis 
of variance of the anticipatory re- 
sponse did not show a significant 
Groups X Trials interaction ora Trials 
main effect. <A statistically reliable 
decrease in the magnitude of the 
anticipatory response between the 
first and last extinction trials was 


found for Group IV but not for Group 


I. Extinction of the maximum re- 
sponse is more apparent and both a 
Groups X Trials interaction and a 
Trials main effect were found to be 
statistically reliable occurrences. 

The abrupt increase in the an- 
ticipatory and maximum responses on 
the second extinction trial in com- 
parison to the first extinction trial 
was found to be statistically sig- 
nificant. At present, no information 
can be offered to account for this 
finding and additional work is neces- 
sary toclarify its basis and significance. 


Discussion 


According to the results of this ex- 
periment only the two groups that had 
the CS followed by a UCS that in- 
cluded shock showed consistent modifica- 
tion of the pupillary response. A similar 
change in the pupillary response was 


A. A. GERALL, P. B. SAMPSON, AND G. L. BOSLOV 


not recorded in the two sensitization 
groups. This finding supports the con- 
clusion that the modification of the 
pupillary response of Groups I and IV 
was due to the conditioning procedure 
and not to startle or sensitization 
factors. The performance of Group II 
was not different from that of Groups 
IIf and V. The absence of--CR’s 
in Group II demonstrates that with the 
procedure used in this study, a UCS 
composed of a decrease in brightness 
of a light source is not sufficient for the 
classical conditioning of the dilation 
response of the pupil. 

These findings are consistent with 
those reported in most of the other 
studies of this problem and with the 
predictions made by Spence (13). The 
experiments by Hilgard, Miller, and 
Ohlson (7) and Wedell, Taylor, and 
Skolnick (17) used light offset alone 
as the UCS and no conditioning was 
found. Their results and those of the 
present experiment are not in agree- 
ment with the reports by Baker (1) 
and Cason (2). Watson (16) states 
that two out of four Ss were conditioned 
to a buzzer when an intense light was 
used as the UCS. None of the other 
studies used such a stimulus but if the 
light had noxious characteristics, it 
is suggested that it might have proper- 
ties functionally similar to shock and 
that the positive results would be 
replicated. There is also a consistency 
between the results of the present study 
and those reported by Girden (3), 
Harlow (4), and Harlow and Stagner 
(5) for animal Ss. To complete the 
comparison between human Ss and 
cats with reference to the role of the 
UCS in pupillary conditioning, it would 
be necessary to determine if pupillary 
dilation can be conditioned with an 
UCS consisting of light offset alone. 
Such a study recently has been completed 
at the University of Rochester and the 
results indicate that cats do not manifest 
CR’s when this type of UCS is paired 
with a CS for 19 trials; whereas, cats 
presented with a UCS containing shock 
have dilation responses evoked by the 





CONDITIONING OF PUPILLARY DILATION 


CS before 10 trials, all other aspects of 
the procedure being the same. 

With regard to the controversy con- 
cerning the role of reinforcement in 
the conditioning of the pupillary re- 
sponse, the results support a reinforce- 
ment view. Obviously, no statement 
can be made on the basis of this study 
as to whether reinforcement is a neces- 
sary variable for the conditioning process. 
It merely demonstrates that a known 
reinforcer does affect the acquisition of 
a response governed by the autonomic 
nervous system. On the other hand, the 
results are not consistent with the aspect 
of two-factor theories that posit rein- 
forcement has no effect upon the classical 
conditioning of responses innervated by 
the antonomic nervous system. If this 
hypothesis were true then Group II 
should have manifested CR’s and should 
not have been different from Groups 
I and IV. 

Finally, it appears that the pupillary 
dilation response conditioned in this 
study is related to the fear rather than 
to the light reflex mechanism of this 
system. The UCR to a combination 
of light offset and shock was greater 
in magnitude and more rapid than the 
response to shock alone. Yet the mag- 
nitude of CR’s were not different for 
the groups with these different UCS. 
The similarity of the CR’s suggests 
that the same neural components that 
affect the pupil were involved in the 
modification process. In this case, it 
probably is that part of the sympathetic 
nervous system responsive to fear or 
pain stimuli. : 


SUMMARY 


An attempt was made to study the effect 
of different UCS upon the modification of the 
pupillary dilation response by a classical con- 
ditioning procedure. The three types of UCS 
used were (a) shock paired with light offset, 
(b) light offset alone, and (c) shock alone. 
Sensitization controls were run for the first 
and third types of UCS. The groups that had 
the UCS with shock manifested conditioning 
and extinction as measured by anticipatory 
and maximum pupillary dilation changes. The 


473 


other three groups showed no consistent change 
and were not statistically different from each 
other. 

A conclusion drawn from the results was that 
a known reinforcer affect the con 
ditioning of a response predominantly governed 
by the autonomic nervous system. The data 
also support the hypothesis that the equivocal 
nature of the results of earlier studies might 
be attributed to the absence of a 
condition. 


dk 8 


reinforcing 
Finally, it was suggested that a 
generalized fear reaction was modified in this 
experiment rather than a specific light reflex. 


REFERENCES 


. Baxer, L. E. The 
conditioned to 
stimuli. Psychol. Monogr., 
No. 3 (Whole No. 223) 

. Cason, H. The conditioned 
reaction. j. exp. Psychol., 
108-146 

. Giroen, E. The dissociation of pupillary 
conditioned reflexes under erythroidine 
and curare. J. exp. Psychol., 1942, 31, 
322-332 

. Harrow, H. F 


curare 


pupillary 
subliminal 


response 
auditory 
1938, 50, 


pupillary 
1922, 5S, 


The effects of incomplete 


paralysis upon formation and 


conditioned 


Ps ye hol 9 


elicitation of 
cats. J. 
273-282. 
Harvow, H. F., & Stacner, R. Effect of 
complete striate muscle paralysis upon 


exp Psyc hol., 


responses in 


1940, 56, 


genet 


the learning process. J 
1933, 16, 283-294 

Hitcarp, FE. R., Durron, C. E., & Hew 
mick, J. S. Attempted pupillary con 
ditioning at four stimulus intervals 
J. exp. Psychol,, V9A9, 39, 683-059, 

. Hirgaro, FE R., Mitver, 
a a 


lary 


]., & Ourson, 
Three attempts to secure pupil- 
conditioning to auditory stimuli 
near the absolute threshold. J. exp. 
Psychol., V941, 29, 89-103 
3. Huw, C. L. Principles of behavior. New 
York: Appleton-Century-Crofts, 1943. 
. Mitrer, N. FE. Learnable drives and 
In S. S. Stevens (Ed.), Hand- 
New 


rewards 
book of experimental psychology 
York: Wiley, 1951 
. Mowrer, ©. H 
personality dynamics 
ald, 1950. 
Mowner, O. H. On the dual 


learning——a 


theory and 
New York Ron 


Learning 


nature of 


reinterpretation of “con 





A. A. GERALL, P. B. SAMPSON, AND G. I. BOSLOV 


ditioning” and “problem solving.” Har- 16. Watson, J.B. The place of the conditioned 
ward educ. Rev., 1947, 17, 102-148. response in psychology. Psychol. Rev., 

12. SKINNER, B. F. ‘Two types of conditioned 1916, 23, 89-116. 
reflex and a pseudo-type. J. gen. . Weve xt, C. H., Tayzor, F. V., & SkOLNick, 
Psychol., 1935, 12, 66-77. A. An attempt to condition the pupil- 
13. Srence, K.W. Theoretical interpretations is ~~ J. exp. Psychol., 1940, 
. Younc, F. A. An attempt to obtain 
pupillary conditioning with infrared 
photography. J. exp. Psychol., 1954, 


of learning. In C. P. Stone (Ed.), 
Comparative psychology. New York: 
Prentice-Hall, 1951. 


14. Steckxie, L. C. Two additional attempts 48, 62-68. 


to condition the pupillary reflex. /. . Zaparenko, R. N. The conditioning of 

gen. Psychol., 1936, 15, 369-377. the pupillary light reflex. Unpublished 
15. Srecxue, L. C., & Rensnaw, S. An master’s thesis, Univer. of Pittsburgh, 

investigation of the conditioned iridic 1939. 

reflex. J. gen. Psychol., 1934, 11, 3-23. (Received December 5, 1956) 











ANNOUNCEMENT 


JOURNAL OF EDUCATIONAL 
PSYCHOLOGY 


This journal wi!l now be published by 
the American Psychological Associa- 
tion. In 1958 it will become a bi- 
monthly; issues will appear in Febru- 
ary, April, June, August, October, 
and December. Contents include 
articles on problems of teaching, 
learning, and the measurement of 
psychological development. 


All back issues and subscriptions up to and including 
the May 1957 issue are the property of Warwick and 


York, Inc., 10 East Centre Street, Baltimore 2, 
Maryland. 


Subscription $8.00 Single 
(Foreign, $8.50) Copies, $1.50 


Direct new subscriptions and renewals to: 


AMERICAN PSYCHOLOGICAL ASSOCIATION 
Publications Office 
1333 Sixteenth Street, N. W. 
Washington 6, D. C. 











_ ARE THERE GAPS IN YOUR 
| FILES OF APA JOURNALS? 





| 


Then héar this > 


{The Arerian Prycbologicl Amocaton announces 2 
during the period October 1957 oo March 1 
‘the following journals, all available issuce wy bier 
herent ene niger army dyseatiey ata 
only 50¢ (foreign, 60¢) per issue: 

American Psychologist 

Journal of Abnormal & Social Psychology 
Journal of Applied Psychology 


Journal of Comparative & Physiological Psychology 
; (1947-1950 only) 


Journal of Consulting Psychology 
Journal of Experimental Psychology 
Psychological Abstracts 
Peychological Bulletin 
Peychological Index 

(a few complete volumes, some shopworn) 
Peychological Monographs 
Peychological Review 


a sale 
Of 
olumes 
price of 


Not all issues in all volumes are available. But—ORDER 
NOW before more back issues go out of print. From our 
available stock we will complete as much of your order as 
possible at this reduced price and for this limited period. 


Delivery: No dealer or quantity 
6 to 8 weeks discounts 


After this sale, for the years preceding 1948, journals will be available 


HO>wW &€ BH >e 





AWMadARAD = 





RESOLUTION CHART 
































100 MILLIMETERS 


- 


INSTRUCTIONS Resolution is expressed in terms of the lines per millimeter recorded by a particular 
film under specified conditions. Numerals in chart indicate the number of lines per millimeter in adjacent 
“T-shaped” groupings. 

In microfilming, it is necessary to determine the reduction ratio and multiply the number of lines in the 
chart by this value to find the number of lines recorded by the film. As an aid in determining the reduction 
ratio, the line above is 100 millimeters in length. Measuring this line in the film image and dividing the length 
into 100 gives the reduction ratio. Example: the line is 20 mm. long*in the film image, and 100/20 = 5. 


Examine “T-shaped” line groupings in the film with microscope, and note the number adjacent to finest 
lines recorded sharply and distinctly. Multiply this number by the reduction factor to obtain resolving power 
in lines per millimeter. Example: 7.9 group of lines as clearly recorded while lines in the 10.0 group are 
not distinctly separated. Reduction ratio is 5, and 7.9 x §. = 39.5 lines per millimeter recorded satisfacto- 
rily. 10.0 x § = 50 lines per millimeter which are not recorded satisfactorily. Under the particular condi- 
tions, maximum resolution is between 39.§ and $0 lincs per millimeter. , 


’ 
Resolution, as measured on the film, is a test of the entire photographic system, including lens, exposure, 
_ processing, and other factogs. These rarely utilize maximum resolution of the film. Vibrations during 
ure, lack of critical focus, and exposures yielding dense negatives are to be avoided 





